UNIVERSITY  OF 

ILLINOIS  LIBRARY 

AT  URBANA-CHAMPAIG.N 


Report  No.  UIUCDCS-R-74-657 


NSF  -  OCA  -  GJ-36936  -  000005 


AN  EXPERIMENTAL  INFORMATION  RETRIEVAL  SYSTEM 


by 


William  Howard  Stellhorn 


July  1974 


Digitized  by  the  Internet  Archive 
in  2013 


http://archive.org/details/experimentalinfo657stel 


Report  No.  UIUCDCS-R-74-657 


AN  EXPERIMENTAL  INFORMATION  RETRIEVAL  SYSTEM 


by 


William  Howard  Stellhorn 


July  1974 


Department  of  Computer  Science 

University  of  Illinois  at  Urbana-Champaign 

Urbana,  Illinois  61801 


This  work  was  supported  in  part  by  the  National  Science  Foundation  under 
Grant  No.  US  NSF-GJ-36936. 


n 


TABLE  OF  CONTENTS 

Page 

PART  I.    User's  Guide 1 

1.0    Introduction 1 

2.0    Data  Base  Organization 3 

3.0  Interactive  Search  Language  6 

3.1  FIND  Instruction 6 

3.1.1  Examples  of  Valid  FIND  Instructions 8 

3.1.2  Examples  of  Invalid  FIND  Instructions.  ...  9 

3.2  PRINT  Instruction  9 

3.3  Planned  Extensions 12 

4.0    Error  Messages 14 

PART  II.    Technical  Description 15 

1.0    Introduction 15 

2.0    Table  Searching 20 

3.0  FIND  Statement  Processing 22 

3.1  Tag  Structure  and  Meaning 23 

3.2  The  TAG  Table 28 

3.3  The  STRTXT  Table 31 

3.4  Processing  Procedures  31 

3.4.1  Tag  Assignment  --  The  Level  Table 31 

3.4.2  Search  Context  Restriction  -- 

The  "IN"  Clause 33 

3.4.3  Syntax  Checking 33 

4.0  Search  Scheduling  36 

4.1  Scheduling  Criteria  36 

4.2  Changes  in  Scheduling  Procedures .  37 

4.3  Success  and  Failure  Linkage  38 

4.4  Final  Organization  of  Tag  Table 39 

4.5  Processing  Procedures  42 

5.0    PRINT  Statement  Processing 44 

6.0  Search  Control 45 

6.1  Detailed  Data  Base  Structure 45 

6.1.1  C-Delimiters 45 

6.1.2  D-Delimiters 45 

6.1.3  E-Delimiters 48 

6.2  Directory  and  Control  Data 48 

6.3  Search  Control  Procedures  51 

6.3.1  Search  Types 51 

6.3.1.1  Type  I  Search 53 

6.3.1.2  Type  II  Search 53 

6.3.1.3  Type  III  Search 55 

6.3.1.4  Type  IV  Search 55 

6.3.2  Sentence  and  Paragraph  Restrictions 56 


m 


Page 

7.0    PRINT  Control 57 

8.0    Implementation  of  Negation  59 

9.0    System  Errors 60 

10.0  S-Level  Debugging  Facilities: 

the  DUMP  and  CHANGE  Instructions 61 

10.1  DUMP  Instruction 61 

10.2  CHANGE  Instruction  61 

APPENDIX 63 

A  —  Flow  Charts 64 

B  --  Register  and  Flag  Assignments.  .  . 106 

LIST  OF  REFERENCES 114 


IV 


LIST  OF  TABLES 

Page 

3.1  Level  Four  Tag  Assignments 26 

3.2  Tag  Assignments  for  Expression  3.1 27 

3.3  Tag  Table  Entries  for  Expression  3.1  after  Parsing  Procedure.  .  30 

3.4  Parsing  Procedures 32 

3.5  Legal  Successors 34 

4.1  Success  and  Failure  Linkage  for  Expression  4.1 39 

4.2  Complete  Tag  Table  for  Expression  4.2 41 

6.1  Context  Delimiting  Characters  46 

6.2  Directory  Contents 50 

6.3  Control  Word  Assignments 51 

6.4  Search  Schedule  for  Expression  6.1 52 

6.5  Search  Control  Classifications 54 


LIST  OF  FIGURES 

Page 

2.1    Document  Structure 4 

3.1    Tag  Structure 24 

6.1    Document  Organization  in  Storage 47 


PART  I.  User's  Guide 

1.0  Introduction 

This  report  describes  the  initial  implementation  of  an  experimental 
system  designed  to  support  the  development  of  effective  algorithms  for  re- 
trieving information  from  a  large  variety  of  data  bases,  and  especially  from 
data  bases  which  have  wery   little  inherent  structure.  Because  direct  se- 
quential scanning  of  the  data  is  expected  to  play  some  part  in  the  operation 
of  such  a  retrieval  system,  an  immediate  goal  of  this  program  is  to  evaluate 
quantitatively  various  strategies  for  efficient  searching  in  this  environment. 
Another  immediate  goal  is  to  study  the  interaction  between  the  system  and  a 
group  of  motivated  users  who  are  not  necessarily  experts  either  in  computer 
techniques  or  in  the  subject  matter  of  the  data  base. 

In  this  implementation,  searching  is  performed  by  means  of  a 
sequential  scan  through  the  complete  text  of  the  data  base.  The  user  has  yery 
general  control  over  the  text  of  the  search  terms,  the  logical  structure  of 
the  search  request  and  the  contexts  to  be  searched  and  printed.  Any  character 
string  may  be  used  as  a  search  term,  and  it  is  possible  to  locate  co-occurrences 
of  terms  within  sentences  or  paragraphs  as  well  as  in  larger  document  sub- 
divisions. 

The  system  runs  on  a  microprogrammable  Burroughs  D-Machine  mini- 
computer with  1024,  64-bit  words  of  microstore  and  8192,  16-bit  words  of  main 
memory.  Peripherals  include  a  card  reader,  a  line  printer  and  a  disk  with  a 
storage  capacity  of  about  25  million  bits. 

A  separately-developed,  companion  system,  which  uses  an  inverted 
file  organization  and  a  considerably  expanded  inquiry  language  and  which  runs 


on  a  DEC  PDP-11,  is  also  operational.  Eventually  the  two  systems  might  be 
linked  together. 

Part  I  of  this  report  is  a  user's  manual  which  describes  the 
organization  of  data  and  the  use  of  the  inquiry  language  in  the  D-Machine 
retrieval  system.  Part  II  describes  the  operating  details  of  the  control 
program  with  special  attention  to  the  algorithms  which  schedule  and  control 
searching  and  which  may  be  used  to  investigate  search  strategies. 


2.0  Data  Base  Organization 

The  data  base  is  organized  as  a  collection  of  "documents",  each  of 
which  may  be  divided  into  several  sections  and  subsections  as  required  by  a 
particular  application.  The  remainder  of  this  discussion  will  deal  with  a 
collection  of  technical  articles  although  the  data  formatting  system  described 
could  be  applied  equally  well  to  other  contexts. 

Each  article  enters  the  system  directly  in  its  full  original  form 
with  a  number  of  special  characters,  or  context  delimiters,  inserted  for  the 
purpose  of  identifying  the  beginnings  and  ends  of  the  various  sections.  These 
context  delimiters  are  arranged  in  the  hierarchical  structure  illustrated  in 
Figure  2.1.  Words  printed  in  capital  letters  in  the  figure  are  section  names, 
and  any  of  these  names  may  be  specified  by  the  user  as  an  area  to  be  searched 
or  printed  by  means  of  commands  to  be  described  in  Section  3.0.  Two  or  more 
names  shown  together  on  a  single  line  are  synonymous  and  may  be  used  interchange- 
ably. When  one  context  name  in  the  figure  is  indented  relative  to  another,  the 
former  section  is  completely  contained  within  the  latter,  and  is  implicitly  in- 
cluded in  any  reference  to  the  latter.  For  example,  TEXT  includes  ABSTRACT, 
BODY,  and  NOTES  but  excludes  reference  lists,  index  terms,  bibliographic  data, 
etc.  The  terms  DOCUMENT  and  ARTICLE  are  interchangeable  and  include  all  other 
sections. 

Using  this  system,  a  person  may  restrict  his  search  to  titles,  author's 
names,  full  bibliographic  citations,  abstracts,  keyword  lists,  etc.;  or  he  may 
include  the  entire  contents  of  the  data  base.  Searching  can  also  be  performed 
within  sentences  or  paragraphs,  in  which  case  each  sentence  (paragraph)  in 
Sections  D,  E  and  F  is  searched  individually  in  accordance  with  the  search 
request. 


DOCUMENT  or  ARTICLE 

A.  DATA 

1 .  AUTHOR 

2.  TITLE 

3.  SOURCE 

4.  DATE 

5.  PAGE  or  PAGES 

6.  MISC 

B.  INDEX 

C.  KEYS 

D.  TEXT 

1 .  ABSTRACT 

2.  BODY 

3.  NOTES 


(bibliographic) 


(publication) 


(any  other  bibliographic  data 
in  the  file) 

(not  currently  used—may  be  used  later  for 
a  concordance) 

(keyword  list  published  as  part  of  the 
document) 


(text  of  document) 
(footnotes) 


REFERENCES  or  REFS 


F.  COMMENTS 


(reserved  for  user's  comments  to  be 
recorded  with  the  file) 


Items  D,  E,  and  F  may  be  further  subdivided  by  PARAGRAPH  and 
SENTENCE. 


Figure  2.1.  Document  Structure 


Some  terms  in  Figure  2.1  require  explanation.  Sections  B  and  C 
(INDEX,  KEYS)  are  reserved  tentatively  for  two  different  systems  of  index 
terms  which  might  be  attached  to  the  document  by  the  author,  an  independent 
agency  or  the  retrieval  system  itself.  Neither  section  is  used  at  the  present 
time,  and  they  are  available  mainly  for  experimental  purposes  and  for  system 
growth.  Section  F  (COMMENTS)  is  provided  for  use  with  a  note-taking  facility 
which  may  be  added  to^the  system  at  a  later  date.  This  would  allow  a  user  to 
enter  comments  of  his  own  for  future  retrieval  with  a  document.  Such  comments 
might  be  stored  permanently  with  the  original  text  (in  a  small  private  system), 
or  they  might  be  loaded  from  a  user's  personal  files  during  the  LOGON  procedure. 


3.0  Interactive  Search  Language 

The  interactive  language  provides  for  communication  between  the  user 
and  the  retrieval  program.  Two  instructions,  FIND  and  PRINT,  are  currently 
available;  and  several  others  which  require  the  use  of  disk  facilities  will  be 
provided  soon.  Throughout  this  description  of  the  search  language  instructions, 
the  symbols  "<  >"  are  used  to  indicate  that  some  information  is  to  be  supplied 
by  the  user;  and  square  brackets,  [  ],  are  used  to  indicate  that  the  information 
enclosed  is  optional.  These  symbols  are  never  part  of  the  required  input.  The 
keywords  "FIND",  "PRINT",  and  "IN"  must  always  be  used  as  shown  in  the  sample 
instructions  and  must  be  followed  by  at  least  one  blank.  Other  blanks  are 
ignored  except  when  they  appear  between  apostrophes  as  part  of  a  search  term. 

3.1  FIND  Instruction 

The  FIND  instruction  causes  the  system  to  search  for  the  occurrence 
of  one  or  more  character  strings  specified  by  the  user.  The  search  may  be 
conducted  in  the  entire  document  or  may  be  restricted  to  particular  sections  or 
subsections.  The  format  for  the  FIND  instruction  is: 

FIND  <  logical  expression  >  [IN  <  context  name  >  ]. 

The  "logical  expression"  required  here  consists  of  character  strings 
to  be  located  in  the  text,  enclosed  by  apostrophes,  and  separated  by  the  symbols 
"*"  (logical  AND)  and  "+"  (logical  OR).  Parentheses  may  be  used  freely  to  group 
terms  in  any  way  desired  by  the  user.  In  the  absence  of  parentheses,  the 
operator  "*"  is  considered  to  be  dominant  over  the  operator  "+",  i.e., 

X  +  Y  *  Z 
is  equivalent  to 

X  +  (Y*Z). 


Connecting  two  or  more  search  strings  by  "*"  will  cause  a  document 
to  be  retrieved  only  if  all  the  strings  so  connected  are  present  together  in 
the  context  specified.  Joining  two  or  more  strings  by  "+"  will  cause  retrieval 
of  a  document  if  any  of  the  requested  strings  is  present.  Currently,  the 
logical  operator  "NOT"  is  not  available,  although  provision  has  been  made  in 
the  control  program  to  include  it  at  a  later  time. 

Since  the  retrieval  program  was  designed  to  operate  on  many  different 
kinds  of  data  bases,  there  is  no  restriction  on  the  character  strings  which  may 
be  sought  except  that  the  requested  string  must  be  entered  exactly  as  it  appears 
in  the  text.  For  example,  the  string  'SHAR1  could  be  used  to  retrieve  any  or 
all  of  the  following:  SHARE,  SHARED,  SHARING,  TIMESHARING,  etc.  These  strings 
need  not  observe  word  boundaries;  prefixes,  suffixes,  or  both  may  be  dropped; 
punctuation  marks  may  be  included;  and  the  strings  requested  may  overlap.  Note, 
however,  that  a  search  string  may  not  extend  from  one  sentence  or  paragraph  to 
another  because  these  Context  units  are  separated  in  the  text  by  special  char- 
acters which  cannot  be  entered  from  the  input  terminal.  Since  apostrophes  are 
used  to  indicate  the  ends  of  the  search  strings,  apostrophes  which  are  to  be 
included  in  the  search  itself  must  be  typed  twice.  For  example,  to  locate  the 
word  ISN'T,  type:  FIND  'ISN'T  ...  . 

The  use  of  IN  followed  by  a  context  name  is  an  optional  feature  which 
allows  the  user  to  restrict  his  search  to  selected  document  sections  as  defined 
in  Figure  2.1.  By  means  of  the  "IN"  clause,  a  search  may  be  confined  to 
document  titles,  authors'  names,  abstracts  or  other  document  subdivisions.  When 
PARAGRAPH  or  SENTENCE  is  specified  after  IN,  the  search  logic  defined  in  the 
"logical  expression"  is  applied  separately  to  each  paragraph  (sentence)  in  each 


document  searched.  Hence  if  the  user  requests 

FIND   'TERM  A'  *  'TERM  B'  IN  SENTENCE, 

the  strings  'TERM  A'  and  'TERM  B'  must  both  occur  in  the  same  sentence  in  order 
for  a  document  to  respond.  A  document  which  contains  both  'TERM  A'  and  'TERM 
B',  but  not  in  the'  same  sentence  will  not  be  retrieved  by  this  request.  If, 
however,  the  user  requests 

FIND   'TERM  A'  +  'TERM  B'  IN  SENTENCE, 

any  document  which  contains  either  'TERM  A'  or  'TERM  B'  anywhere  between  sen- 
tence boundaries  will  respond.  The  effect  of  the  "IN"  clause  in  this  case  would 
simply  be  to  restrict  the  search  to  TEXT  (ABSTRACT,  BODY,  NOTES),  REFERENCES, 
and  COMMENTS  sections  since  these  are  the  only  sections  which  contain  SENTENCE 
subdivisions. 

Only  one  context  name  from  the  list  in  Figure  2.1  may  be  specified  in 
an  "IN"  clause.  If  no  "IN"  clause  is  supplied,  the  default  is  ARTICLE. 

3.1.1  Examples  of  Valid  FIND  Instructions 

FIND  'KWIC  +  'KEY'  *  'WORD'  *  'CONTEXT'  IN  TITLE 

FIND     'FIND' 

FIND  ('ON-LINE'+'REAL  TIME'+'TIME  SHAR ')*( 'C0MPUT'+' PROCESS' 
+' SYSTEM') 

This  last  example  would  restrict  retrieval  to  those  documents  em- 
ploying the  spelling  conventions  shown,  it  would  not  retrieve  articles  refer- 
ring to  "REAL-TIME  SYSTEMS"  or  to  "REALTIME  SYSTEMS".  Two  other  formulations 
that  would  avoid  this  restriction  are: 


FIND  ('ON'*'LINE'  +  '  REAL '* 'TIME  '  +  'TIME'*'SHAR' )*( 'COMPUT' 
+  'PROCESS'  +  'SYSTEM') 

and 

FIND  ('ON'*'LINE'  +  'TIME'*( 'REAL'  +  'SHAR'))  *  ('COMPUT' 
+  'PROCESS'  +  'SYSTEM') 

3.1.2  Examples  of  Invalid  FIND  Instructions 

FIND  RETRIEVAL 

(no  apostrophes  around  "RETRIEVAL") 

FIND  'BONE  MARROW  AND  'TRANSPLANT' 

("*"  is  the  required  symbol  for  logical  "AND") 

FIND'ZIPF'  IN  AUTHOR 

(blank  must  follow  "FIND") 

FIND  ('TIME'  *  ('REAL'  +  'SHAR')  *  'COMPUT' 

(  a  "("  occurs  without  a  corresponding  ")"  ) 

3.2  PRINT  Instruction 

The  PRINT  instruction  indicates  to  the  control  program  which  sections 
of  a  responding  document  are  to  be  printed  after  a  search.  Its  format  is: 

PRINT  <  context  list  > 

where  the  "context  list"  consists  of  one  or  more  context  names  (Figure  2.1) 
separated  by  blanks  or  commas.  Any  number  of  context  names  may  be  specified 
and  in  any  order. 

Two  asterisks  (**)  are  placed  in  the  margin  of  the  printed  copy  beside 
every  line  which  contains  any  search  string  specified  in  the  associated  FIND 
instruction. 

Normally,  no  context  unit  will  be  printed  more  than  once.  Two  or  more 
equivalent  or  overlapping  context  names  may  be  specified;  however,  in  such  cases 
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the  most  general  name  in  the  hierarchy  will  be  selected  for  printing,  and  the 
other  related  terms  will  be  ignored.  For  example,  if  both  SENTENCE  and  PARAGRAPH 
are  selected,  the  result  will  be  the  same  as  if  PARAGRAPH  alone  had  been  requested. 
Similarly,  the  following  PRINT  statements  are  all  equivalent  since  TEXT  is  composed 
of  the  three  sections  ABSTRACT,  BODY  and  NOTES: 

PRINT  TEXT 

PRINT  TEXT,  ABSTRACT 

PRINT  ABSTRACT,  BODY,  NOTES. 
Because  of  the  extent  to  which  the  user  can  control  the  contexts  in  which 
searching  and  printing  take  place,  it  has  been  necessary  to  establish  conventions 
for  handling  a  number  of  situations  in  which  the  intended  action  is  not  clearly 
defined.  For  example,  what  is  the  meaning  of  the  request: 

FIND  'TERM  A'  *  'TERM  B'  IN  TEXT 

PRINT  SENTENCE  ? 

In  such  cases  the  nature  of  the  printed  output  depends  upon  the  context  of  the 
FIND  instruction  as  well  as  those  in  the  PRINT  request. 

The  rules  which  govern  printing  under  various  conditions  are  given 
below.  Regardless  of  the  order  in  which  contexts  are  given  in  the  PRINT  state- 
ment, they  are  processed  in  the  order  shown.  Throughout  this  discussion,  the 
phrase  "major  context  unit"  refers  to  any  of  the  following:  DOCUMENT,  ARTICLE, 
DATA,  INDEX,  KEYS,  TEXT,  ABSTRACT,  BODY,  NOTES,  REFERENCES  or  COMMENTS;  "minor 
context  unit"  refers  to  SENTENCE  or  PARAGRAPH. 

DOCUMENT,  ARTICLE 

The  entire  document  is  printed,  and  all  other  printing  requests  are 
ignored. 
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DATA 

The  complete  bibliographic  data  section  is  printed,  and  all  separate 
requests  for  individual  bibliographic  items,  such  as  TITLE  or  AUTHOR,  are  ignored. 

AUTHOR,  TITLE,  SOURCE,  DATE,  PAGES,  MISC. 

Selected  items  are  printed  in  the  order  shown. 

INDEX,  KEYS,  ABSTRACT^ 

These  major  context  units  are  printed  in  the  order  shown.  Note  that  a 
request  to  print  ABSTRACT  always  produces  a  complete  copy  of  the  abstract.  As 
explained  below,  this  is  not  true  of  other  major  document  sections  which  contain 
paragraph  and  sentence  subdivisions. 

PARAGRAPH,  SENTENCE 

The  processing  of  print  requests  for  PARAGRAPH  and  SENTENCE  depends  both 
upon  what  other  contexts  are  to  be  printed  and  upon  what  context  has  been  specified 
in  the  FIND  instruction.  Print  requests  for  PARAGRAPH  or  SENTENCE  do  not  affect 
the  printing  of  DOCUMENT,  ARTICLE,  or  ABSTRACT;  but  they  supersede  requests  for 

TEXT,  BODY,  NOTES,  REFERENCES  and  COMMENTS.  If  both  PARAGRAPH  and  SENTENCE  are 
requested,  PARAGRAPH  is  selected. 

The  following  paragraphs  describe  the  interaction  of  the  instruction 
"PRINT  PARAGRAPH"  with  the  various  contexts  that  may  be  selected  in  the  FIND 
instruction.  Similar  remarks  apply  to  "PRINT  SENTENCE". 

If  the  FIND  context  is  DATA,  INDEX  or  KEYS  (major  context  units  which 
do  not  contain  paragraphs)  no  output  is  produced. 

If  the  FIND  context  is  any  bibliographic  subdivision  (TITLE,  AUTHOR, 
etc.),  the  full  bibliographic  citation  is  printed. 
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If  the  FIND  context  is  any  other  major  context  unit,  all  paragraphs  are 
printed  which  lie  within  that  context  unit  and  which  contain  any  of  the  search 
strings  requested  in  the  FIND  statement,  regardless  of  the  Boolean  relationships 
that  may  have  been  specified  there. 

If  the  FIND  context  is  PARAGRAPH  or  SENTENCE  and  the  PRINT  context  is 
PARAGRAPH,  only  those  paragraphs  which  completely  satisfy  the  search  request, 
including  the  Boolean  relationships,  or  which  contain  sentences  that  do  so  are 
printed.  Similarly,  if  the  FIND  and  PRINT  contexts  are  both  SENTENCE,  only 
sentences  which  satisfy  the  search  request  are  printed. 

If  the  FIND  context  is  PARAGRAPH  and  the  PRINT  context  is  SENTENCE, 
then  from  those  paragraphs  that  satisfy  the  search  request  ewery   sentence 
containing  any  of  the  requested  search  strings  is  printed. 

TEXT,  BODY,  NOTES,  REFERENCES,  COMMENTS 

In  the  absence  of  print  requests  for  SENTENCE  or  PARAGRAPH,  these 
major  context  units  are  printed  in  the  order  shown.  If  TEXT  is  specified,  separate 
requests  for  BODY  and  NOTES  are  ignored. 

At  the  present  time,  only  the  control  routine  can  initiate  a  PRINT 
instruction:  it  does  this  as  part  of  its  preparation  for  a  search.  In  this 
mode,  the  system  types  the  question,  "PRINT?",  and  the  user  responds  by  typing 
a  context  list.  A  later  version  of  the  program  will  allow  the  user  to  request 
directly  the  printing  of  specified  documents  or  document  subsections  and,  in 
particular,  of  documents  which  responded  in  any  of  several  previous  searches. 

3.3  Planned  Extensions 

Several  new  features  will  be  added  to  the  basic  system  shortly  in 
order  to  increase  the  user's  ability  to  control  a  search.  FIND  instructions 
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will  be  numbered  sequentially,  and  the  text  of  each  question  will  be  saved 
along  with  a  list  of  documents  which  responded  to  that  question.  It  will  be 
possible  to  specify  that  a  new  search  is  to  be  restricted  to  the  particular 
documents  specified  or  to  the  set  of  documents  retrieved  in  any  previous  search. 
It  will  also  be  possible  to  combine  the  results  of  several  previous  questions, 
using  logical  AND,  OR  and  NOT,  in  order  to  produce  new  document  sets  which  may 
be  searched  subsequently. 

One  other  feature  planned  for  the  system  is  the  COMMENT  facility. 
This  command  is  visualized  as  an  underlining  and  note-taking  facility  which  will 
allow  the  user  to  enter  any  remarks  he  may  wish  to  make  for  storage  directly  with 
the  text  of  a  document.  In  the  initial  implementation  of  the  system,  disk  space 
will  be  reserved  at  the  end  of  each  document  for  comments.  These  comments  will 
be  entered  along  with  some  user  identification  and  will  become  a  semi -permanent, 
searchable  part  of  the  text.  Eventually,  with  the  aid  of  graphics  terminals, 
entry  of  some  comments  may  be  performed  very   much  like  underlining  in  a  book; 
and  it  may  be  possible  to  transfer  parts  of  the  original  text  into  a  user's 
private  file  for  later  reference  and  use. 
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4.0  Error  Messages 

This  section  contains  a  list  of  retrieval  system  error  messages  and 
their  interpretations.  Whenever  possible,  the  system  places  an  up-arrow  under 
the  input  character  position  at  which  the  error  was  detected. 


MESSAGE 


NOTES 


Invalid  Character  or 
Keyword 

Invalid  Successor 


Missing 


Unbalanced  (  ) 's 


Too  Many  (  )  Levels 


Too  Many  Terms  or  (  )'s 


Too  Many  Disjunctions 


STRTXT  Table  Full 


Stack  Full 


System 


An  undefined  (possibly  misspelled)  keyword  or  context 
name  or  an  undefined  operator  symbol  has  been  detected. 

A  search  request  is  not  well  formed.  This  message  re- 
sults from  syntactic  errors  such  as  two  successive 
operators,  two  successive  search  terms,  the  keyword 
"FIND"  followed  immediately  by  an  IN  clause,  etc. 

The  total  number  of  apostrophes  in  a  search  expression 
is  odd,  indicating  an  error  in  specifying  some  search 
term. 

The  total  numbers  of  right  and  left  parentheses  in  a 
search  expression  are  not  equal,  or  a  right  paren- 
thesis has  been  detected  before  a  corresponding  left 
parenthesis. 

Parenthesized  quantities  are  nested  to  a  depth  greater 
than  13. 

The  total  number  of  search  terms  and  parenthesized 
quantities  in  a  search  expression  exceeds  30. 

More  than  256  terms  and  parenthesized  expressions  are 
joined  together  by  "+"  (OR).  There  is  no  corresponding 
restriction  on  the  use  of  "*"  (AND).  In  any  event, 
other  limits  should  be  exceeded  before  this  one. 

Storage  capacity  for  search  terms  has  been  exceeded. 
The  total  number  of  characters  allowed  is  approxi- 
mately equal  to  100,  depending  upon  the  lengths  of 
individual  terms. 

Insufficient  temporary  storage  space  available  for  the 
search  scheduling  procedure.  User  should  simplify  the 
form  of  the  search  request,  if  possible,  or  resubmit  as 
two  or  more  separate  inquiries. 

A  programming  or  data  base  formatting  error  has  been 
detected.  Correction  by  system  maintenance  personnel 
is  required.  Temporarily,  the  user  may  be  able  to 
complete  his  search  by  removing  or  changing  restric- 
tions on  the  contexts  to  be  searched  or  printed. 
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PART  II.  Technical  Description 

1 .0  Introduction 

The  retrieval  program  described  in  this  report  is  the  initial 
implementation  of  an  experimental  system  designed  to  support  the  development 
of  effective  algorithms  for  retrieving  information  from  a  large  variety  of 
data  bases,  and  especially  from  data  bases  which  have  very  little  inherent 
structure.  It  is  felt  that  direct  sequential  scanning  of  the  data  will 
necessarily  play  some  part  in  the  operation  of  such  a  general  retrieval  system, 
and  an  immediate  goal  of  this  program  is  to  evaluate  quantitatively  various 
strategies  for  scheduling  a  complicated  search  request  in  this  environment. 
Section  4  discusses  in  detail  the  facilities  provided  for  studying  this 
problem. 

Another  immediate  goal  is  to  study  in  a  controlled  environment  the 
interaction  between  the  system  and  a  group  of  motivated  users  who  are  not 
necessarily  experts  either  in  computer  techniques  or  in  the  subject  matter  of 
the  data  base. 

These  research  goals,  together  with  the  anticipated  characteristics 
of  the  computer  system  to  be  used,  have  lead  to  the  following  design  specifica- 
tions and  constraints: 

1.  Searching  is  to  be  performed  by  means  of  a  sequential  scan 
through  the  complete  text  of  the  data  base. 

2.  The  user  should  have  \jery  general  control  over  the  text  of  the 
search  terms,  the  logical  structure  of  the  search  request,  and 
the  contexts  to  be  searched  and  printed. 


A  collection  containing  the  full  text  of  65  technical  articles  in  the  field  of 
information  retrieval  has  been  obtained  for  initial  experimentation. 
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3.  It  should  be  possible  to  search  for  co-occurrences  of  terms 
within  sentences  and  paragraphs  (or  comparable  subdivisions 
in  a  non-textual  data  base)  as  well  as  individual  items  of 
bibliographic  data  and  larger  document  subdivisions. 

4.  The  user  should  be  given  some  feedback,  in  the  form  of  printed 
output,  immediately  after  a  document  is  retrieved.  (This  is 
probably  impractical  in  a  large  system  where  a  single  inquiry 
may  retrieve  several  hundred  citations.  The  more  common 
practice  is  to  report  to  the  user  the  number  of  citations  re- 
trieved by  a  search  and  allow  him  to  modify  his  inquiry,  request 
printed  output  or  take  some  other  action.) 

5.  A  special  symbol  is  to  be  placed  in  the  margin  of  each  line  of 
printed  output  which  contains  any  of  the  requested  search  terms. 

6.  No  part  of  any  document  is  to  be  printed  more  than  once  in 
response  to  any  given  search  request,  even  though  the  user 
may  specify  overlapping  print  contexts  such  as  ABSTRACT  and 
SENTENCE  or  ABSTRACT  and  TEXT.   (See  Part  I  of  this  report.) 

7.  System  resources,  especially  memory,  will  be  quite  limited.  The 
program  must  operate  from  a  memory  containing  4096,  16-bit  words. 
During  execution  of  a  search,  about  half  the  memory  should  be 
reserved  for  text. 

8.  Data  will  be  accessed  from  the  disk  by  sectors,  where  each  sector 
contains  about  500  words  (1000  characters)  and  each  track  contains 


* 


8192  words  are  available  in  the  present  configuration,  but  the  original  design 
was  for  4096,  and  the  program  can  run  in  that  space. 
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8  sectors.  Any  number  of  sectors  may  be  requested  in  a  single 
read  instruction. 
9.  Disk  I/O  is  to  be  minimized—whenever  possible  a  given  body  of 

text  should  be  read  from  disk  only  once  per  inquiry. 
Because  this  program  is  experimental  and  because  it  was  written  for  a 
newly-designed  minicomputer  (the  Burroughs  D-Machine)  before  the  actual  hardware 
became  available  and  before  the  configuration  to  be  installed  had  been  completely 
determined,  it  inevitably  contains: 

1.  Some  facilities  whose  value  is  unknown. 

2.  Some  facilities  designed  to  assist  in  testing  and  revising  the 
main  program,  notably  the  DUMP  and  CHANGE  routines.  These  pro- 
cedures normally  would  not  be  included  in  a  "production"  version 
of  the  program  which  was  to  be  used  for  retrieval  experiments. 

3.  Some  implementation  features  which  should  be  reexamined  in  light 
of  the  conditions  which  actually  exist.  In  particular,  it  was 
felt  originally  that  memory  space  would  be  the  most  critical 
resource,  and  several  design  decisions  were  made  in  the  interest 
of  conserving  memory. 

The  design  configuration  for  which  the  retrieval  program  was  written 
consists  of  a  microprogrammable  Burroughs  D-Machine  minicomputer  with  1024, 
64-bit  words  of  microstore  and  4096,  16-bit  words  of  main  memory.  Peripherals 
include  a  card  reader,  a  line  printer,  and  a  disk  with  a  storage  capacity  of 
about  25  million  bits.  The  retrieval  program  is  written  in  an  assembly  language 
called  the  S-Language.  The  S-Language  and  its  assembler  are  described  in  detail 
in  [1]  and  [2];  but  some  of  its  features,  which  are  essential  to  an  under- 
standing of  the  present  report,  will  be  reviewed  here. 

The  language  consists  primarily  of  "Word  Instructions",  which  perform 
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standard  arithmetic,  logical,  and  control  functions  and  "String  Instructions" 
which  perform  complicated  manipulations  and  searches  involving  character 
strings.  Sixty-four  "software  registers"  are  reserved  in  main  memory  for 
supplying  the  various  pointers,  characters,  counters  and  transfer  addresses 
required  by  the  string  instructions.  The  three  string  instructions  of  interest 
here  are  FIND,  COMPARE,  and  SEARCHF. 

FIND  searches  character  string  SI  for  an  occurrence  of  string  S2  until 
either  end  character  Kl  is  detected  in  SI  (failure)  or  end  character  K2  is 
detected  in  S2  (success).  After  execution,  the  pointers  associated  with  SI  and 
S2  may  be  independently  set  at  their  initial  or  final  positions  or  at  the  first 
character  on  either  side  of  the  final  position.  The  number  of  characters 
examined  in  SI  is  stored  automatically  in  one  of  the  counter  registers.  Finally, 
control  is  passed  to  the  next  instruction  in  the  program  or  to  one  of  two, 
independently  specified,  alternative  success  or  failure  transfer  addresses. 

COMPARE  operates  much  like  FIND  except  that  the  two  strings  are 
compared  directly,  and  the  comparison  terminates  whenever  an  end  character  or  a 
mismatch  is  detected.  Processing  of  pointers  and  transfer  addresses  after 
execution  is  similar  to  that  for  the  FIND  instruction. 

SEARCHF  searches  forward  through  the  character  string  SI  until  any  of 
three  specified  key  characters  is  located.  Again,  pointer  manipulation  is 
performed,  a  character  count  is  saved  and  independent  transfer  addresses  may  be 
specified  for  each  of  the  three  keys. 

The  remainder  of  this  report  describes  the  operating  details  of  the 
retrieval  control  program.  Section  2  deals  with  the  structure  and  searching  of 
certain  control  tables.  Sections  3  and  4  describe  the  parsing  of  a  search 
request  and  the  scheduling  of  the  search.  Section  5  discusses  the  decoding  of 
a  PRINT  instruction.  Sections  6  and  7  explain  the  actual  control  of  searching 
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and  printing  and  the  interaction  between  the  two.  Sections  8-10  deal  with  the 
possible  implementation  of  a  negative  search  request  (FIND  'TERM  A'  BUT  NOT 
'TERM  B'),  with  the  effects  of  certain  system  errors  (mainly  errors  in  the  data 
base),  and  with  those  debugging  facilities  which  are  incorporated  into  the 
present  version  of  the  program.  Flow  charts  and  register  assignments  are  given 
in  the  appendices. 

Throughout  this  report,  hexadecimal  character  strings  will  be  written 
between  colons  as,  for  example,  in  :80AD:,  which  represents  the  bit  string 
'1000  0000  1010  HOT. 
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2.0  Table  Searching 

In  order  to  conserve  memory  space  and  to  take  advantage  of  the 
specialized  text-searching  features  of  the  S-Language,  the  system  tables 
INSTKEY,  RESKEY  and  CTXT  have  been  designed  for  access  by  sequential  search. 
As  a  result,  some  frequently-occurring  searches  can  be  performed  by  means 
of  a  single  string  instruction  and  others  by  a  standard  short  procedure  (TSRCH) 
containing  only  five  instructions.  Many  of  the  required  register-loading 
operations  need  be  performed  only  once  in  preparation  for  an  arbitrary  number 
of  similar  searches. 

Three  special  characters,  :81:,  :82:,  and  :80:  are  used  in  constructing 
these  tables.  The  :81:  and  :82:  mark  the  beginning  and  end,  respectively,  of 
each  search  key,  and  are  used  to  guarantee  an  exact  match  between  the  input 
string  and  the  table  entry.  The  :80:  identifies  the  end  of  the  table.  Data 
associated  with  a  search  key  begins  in  the  high-order  byte  of  the  first  word 
following  the  :82:  and  continues  for  as  many  bytes  as  necessary.  The  fill 
character  :00:  is  used  as  required  to  assure  word-boundary  alignment  for  the 
data  entries. 

A  typical  table  organized  for  sequential  searching  is  INSTKEY,  which 
contains  the  keywords  FIND,  PRINT  ,  DUMP  ,  and  CHANGE  ,  used  to  identify  input 
instructions.  Following  are  the  first  few  entries  in  this  table: 

:81:"FIND":82:A(FIND  PROCESSING) :81 : "PRINT":8200:A(PRINT 
PROCESSING)  .  .  . 

where  hexadecimal  digits  (4  bits  each)  are  contained  between  colons,  alphabetic 
character  strings  (8  bits/character)  are  enclosed  in  quotation  marks  and  A(X) 


PRINT  is  a  legal  input  even  though  it  causes  no  processing  in  the  present 
implementation.  DUMP  and  CHANGE  are  system  commands  to  be  explained  in  Section 
10. 
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(one  full  16-bit  word,  aligned  on  a  word  boundary)  represents  a  transfer 
address  to  the  appropriate  instruction  decoding  procedure. 

In  order  to  identify  an  input  instruction,  the  program  first  inserts 
the  character  :81:  ahead  of  the  first  non-blank  character  on  the  input  line. 
It  then  calls  TSRCH,  which  executes  a  FIND  instruction  in  an  attempt  to  locate 
the  input  keyword  in  the  INSTKEY  table.  Searching  stops  whenever  a  blank  is 
located  in  the  input  line  (instruction  keywords  must  be  followed  by  at  least  one 
blank)  or  when  the  character  :80:  (end-of-table)  is  encountered  in  the  table. 
If  an  :80:  is  detected,  control  is  transferred  to  an  error  handling  routine; 
otherwise,  the  Data  Pointer  (pointing  into  the  table)  is  left  with  the  byte 
address  of  the  first  unmatched  character.  The  program  next  attempts  to  verify 
by  means  of  a  COMPARE  instruction  that  that  first  unmatched  character  is  :82:. 
If  the  comparison  fails,  control  is  transferred  directly  back  to  the  FIND 
instruction  described  above,  and  the  search  resumes  where  it  left  off.  If  the 
comparison  succeeds,  the  Data  Pointer  is  left  pointing  two  characters  beyond 
the  :82:,  i.e.,  to  either  byte  of  the  word  containing  the  desired  transfer 
address;  and  control  is  returned  to  the  calling  program,  which  uses  the 
contents  of  the  table  pointer  to  access  the  required  data. 

In  addition  to  INSTKEY,  RESKEY  and  CTXT,  the  table,  CHARS,  is  also 
entered  by  sequential  search;  but  it  is  only  used  to  identify  a  character  or 
character  type  according  to  its  position  in  the  table,  as  recorded  auto- 
matically in  Counter  Register  0  at  the  completion  of  the  search. 

It  is  usually  possible  to  modify  these  tables  simply  by  deleting 
unwanted  entries  or  adding  new  ones  at  any  convenient  position.  In  general, 
frequently-used  entries  should  be  stored  ahead  of  those  which  are  less  commonly 
accessed. 
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3.0  FIND  Statement  Processing 

Conceptually,  search  requests  transmitted  to  the  system  by  means  of 
a  FIND  statement  can  be  arbitrarily  complicated  Boolean  expressions  whose 
"variables"  are  character  strings  to  be  located  in  the  data  base.  Search 
terms  are  separated  by  "+"  (logical  OR)  or  "*"  (logical  AND)  and  may  be  grouped 
in  any  desired  way  by  means  of  parentheses.  In  the  absence  of  parentheses, 
the  operator  AND  is  considered  to  be  dominant  over  the  operator  OR,  i.e., 

X  +  Y  *  Z 

is  equivalent  to 

X  +  (Y*Z). 

Negation  is  not  allowed  in  the  current  version  of  the  system;  although,  as 
explained  in  Section  8  below,  provision  has  been  made  for  its  later 
implementation. 

In  order  to  control  the  progress  of  a  search,  it  is  necessary  to 
convert  the  user's  request  into  a  form  which  preserves  the  logical  structure  of 
the  original  but  which  can  be  manipulated  more  conveniently.  This  conversion 
is  accomplished  with  a  single,  serial,  left-to-right  scan  of  the  input  line, 
using  a  procedure  based  on  a  system  of  internally  generated  tags  similar  to 
that  employed  in  the  "Decision  Module  Compiler"  (DMC)  [3].  In  the  DMC,  these 
tags  are  treated  effectively  as  statement  labels  to  assist  in  code  generation. 
In  the  system  under  discussion  here,  they  are  used  in  a  corresponding  way  to 
construct  a  table  for  controlling  the  retrieval  operation  and  for  determining 
the  order  in  which  search  terms  are  considered.  The  construction  and  use  of 
such  a  table  and  the  search  scheduling  algorithms,  to  be  discussed  in  Section  4, 
are  believed  to  be  original. 
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3.1  Tag  Structure  and  Meaning 

A  syntactically  correct  search  request  contains  up  to  six  kinds  of 
elements  in  addition  to  the  keyword  FIND: 

1)  Search  terms:  character  strings  enclosed  between 

apostrophes 

2)  Operators:  "+"  and  "*" 

3)  Left  Parentheses 

4)  Right  Parentheses 

5)  End  Symbol:  carriage  return  or  the  character  string 

"IN_",  where  "_"  represents  a  blank 

6)  Blanks  not  included  within  search  terms. 

The  term  "entity"  will  be  used  to  refer  to  two  types  of  expressions  composed  of 
these  elements: 

1)  Search  terms  and 

2)  Character  strings  beginning  with  a  left  parenthesis  and 
ending  with  a  corresponding  right  parenthesis. 

Note  that  one  entity  may  be  contained  within  another  as  in 

('A'  +  ('B'  +  ('C  *  'D'))), 

which  contains  seven  entities,  (four  search  terms  and  three  parenthesized  ex- 
pressions). Nevertheless,  any  valid  search  request  may  be  regarded  as  a  series 
of  alternating  entities  and  operators,  beginning  and  ending  with  an  entity, 
where  each  entity  may  in  turn  contain  a  similar  alternating  series. 

In  building  the  required  data  structure,  an  internally  generated  tag 
is  assigned  to  each  entity  in  the  search  request.  Each  tag  is  stored  in  one 
full  word  of  memory  (16  bits)  and  contains  two  fields:  the  level  field  (4  bits) 
and  the  disjunction  field  (8  bits),  (see  Figure  3.1).  The  level  field  reflects 
the  depth  to  which  an  entity  is  nested  in  parentheses;  the  disjunction  field 
indicates  the  relationship  between  the  current  term  and  the  one  immediately 
preceding  it  at  the  same  level.  The  high-order  bit  of  each  tag  is  reserved 
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for  use  by  the  search  control  routines;  the  other  three  bits  are  available  for 
expansion,  e.g.,  to  accommodate  the  operator  "NOT". 


* 

UNUSED 

LEVEL 
FIELD 

DISJUNCTION 
FIELD 

0   1 


3  4 


7  8 
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*  RESERVED  FOR  SEARCH  CONTROL  ROUTINES 
Figure  3.1.  Tag  Structure 

In  parsing  a  search  request,  the  entire  expression  is  treated  for 
internal  purposes  as  if  it  were  enclosed  in  parentheses  and  is  assigned  the  tag 
:0100:.  This  is  the  only  level  1  entity  that  ever  occurs. 

The  scanning  procedure  next  identifies  the  first  entity  of  the  input 
statement  and  assigns  to  it  the  tag  :0200:.  If  another  entity  exists  at  level  2, 
it  is  assigned  either  the  tag  :0200:  or  the  tag  :0201:  depending  upon  whether  it 
is  joined  to  the  first  entity  by  "AND"  or  "OR".  As  the  processing  continues, 
all  related  entities  at  a  given  level  joined  by  "AND"  receive  identical  tags  and 
those  joined  by  "OR"  receive  tags  with  increasing  disjunction  values.  When  a 
left  parenthesis  is  encountered,  the  quantity  it  represents  is  assigned  the 
appropriate  tag,  processing  at  the  present  level  (L.)  is  suspended,  and  a  new 
series  of  tags  at  the  next  higher  level  (L-+1)  is  initiated  for  the  entities 

within  the  parentheses.  When  the  corresponding  right  parenthesis  is  encountered, 
level  L.+i  processing  is  terminated  and  level  L.  processing  is  resumed.  No 

overflow  from  one  tag  field  to  another  is  ever  permitted,  and  attempts  to  "OR" 
together  more  than  256  entities  or  to  nest  parentheses  to  a  depth  greater  than 
13  result  in  error  messages  and  termination  of  the  parsing  procedure. 
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As  an  illustration  of  the  tag  assignment  system,  consider  the  input 
expression 

E  *  C  *(1D  +  (2(3F)3)2  *  Y  *  A  *(4B*Z)4  +  G*(5H+I*J+K)5).,  +  L  *  M    (3.1) 

For  notational  convenience  in  this  example  and  those  which  follow,  search  terms 
will  be  represented  by  upper  case  alphabetic  characters  and  no  apostrophes  will 
be  shown.  Subscripts  will  be  attached  to  left  and  right  parentheses  for  the 
purpose  of  identifying  "mates"  and  for  use  in  referring  to  parenthesized 
quantities. 

As  explained  previously,  the  entire  expression  would  be  assigned  the 
tag  :0100:.  At  level  2,  the  expression  has  the  form 

E  *  C  *(1--)1  +  L  *  M, 
and  the  appropriate  tags  are  :0200:  for  E,  C,  and  (, — ),  and  : 0201 :  for  L  and  M. 
Entities  in  the  following  subexpression  receive  level  3  tags: 

D  +(2— )2  *  Y  *  A  *  (4— )4  +  G  *(5—  )5  . 

As  the  scan  proceeds  from  left  to  right,  level  4  is  "entered"  three 
times,  once  for  each  of  the  following  subexpressions: 

<3~>3 

B  *  Z 

H  +  I  *  J  +  K  . 

At  each  entry,  the  level  4  assignment  procedure  is  reinitialized  so 
that  tags  are  assigned  as  shown  in  Table  3.1.  Because  of  the  order  in  which 
tags  are  assigned  and  stored,  and  because  related  tags  are  joined  together  by  a 
system  of  pointers  (to  be  discussed  in  Section  3.2),  no  ambiguity  results  from 
assigning  the  tag  :0400:  to  four  different  entities  in  three  different  level  4 


26 


entries.  The  complete  list  of  tag  assignments  for  this  example  is  shown  in 
Table  3.2. 


Table  3.1.  Level  Four  Tag  Assignments 


Entity 

Tag 

!  Entry  1 

<3->3 

:0400: 

Entry  2 

B 

:0400: 

Z 

:0400: 

Entry  3 

H 

:0400: 

!       I 

:0401: 

J 

:0401: 

K 

:0402: 

Table  3.2.  Tag  Assignments  for  Expression  3.1 
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Entity 

Tag 

1. 

Complete 
Expression 

:0100: 

2. 

E 

:0200: 

3. 

C 

:0200: 

4. 

<!->! 

:0200: 

5. 

D 

:0300: 

6. 

<2->2 

:0301: 

7. 

<3->3 

:0400: 

8. 

F 

:0500: 

9. 

Y 

:0301: 

10. 

A 

:0301: 

11. 

<4->4 

:0301: 

12. 

B 

:0400: 

13. 

Z 

:0400: 

14. 

G 

:0302: 

15. 

<5->5 

:0302: 

16. 

H 

:0400: 

17. 

I 

:0401: 

18. 

J 

:0401: 

19. 

K 

:0402: 

20. 

L 

:0201: 

21. 

M 

:0201: 
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By  reversing  the  rules  for  assigning  tags,  one  can  reconstruct  the 
form  of  a  search  request  from  the  list  of  assigned  tags,  although  that  operation 
is  not  required  in  the  retrieval  program. 

3.2  The  TAG  Table 

For  purposes  of  search  scheduling  and  monitoring,  tags  are  stored 
together  with  other  definition  and  control  information  in  a  table  called  TAGS. 
The  first  two  words  of  the  table  contain  the  word  addresses  of  the  first  and 
last  available  locations  in  the  table.  Following  this  information,  each  data 
entry  consists  of  four  words: 


WORD  1. 

Tag 

WORD  2. 

STRTXT  Address 

WORD  3. 

"Success"  Pointer 

WORD  4. 

"Failure"  Pointer 

If  the  tag  stored  in  Word  1  represents  a  parenthesized  expression,  then 
Word  2  is  set  to  zero.  If  the  tag  represents  a  search  term,  however,  then  Word  2 
contains  the  byte  address  of  the  first  character  of  the  term  as  stored  in  the 
STRTXT  Table  (Section  3.3).  The  significance  of  "success"  and  "failure"  in 
connection  with  the  last  two  words  of  a  Tag  Table  entry  will  be  explained  in 
Section  4.  For  now  it  is  sufficient  to  note  that  these  two  words  contain  a 
system  of  pointers  used  to  control  the  progress  of  a  search.  This  system  is 
constructed  in  two  phases,  the  first  performed  by  the  parsing  routine  and  the 
second  by  a  later  search  scheduling  procedure. 

The  end  of  an  expression  in  the  Tag  Table  is  identified  by  a  zero  in 
Word  1  of  the  first  unused  entry. 

Each  time  a  new  level  of  parentheses  is  entered,  the  parsing  routine 

constructs  a  chain  of  pointers  linking  all  entities  at  that  particular  level 
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and  entry.  One  of  these  pointers  occupies  Word  3  of  each  entry  and  points  to 
Word  1  of  its  successor.  A  pointer  value  of  zero  identifies  the  last  element 
on  a  chain.  The  first  element  on  a  chain  is  stored  in  Word  4  of  the  Tag  Table 
entry  for  the  parenthesized  quantity  itself,  and  it  points  to  the  Tag  Table 
entry  for  the  first  entity  inside  the  parentheses. 

In  addition  to  constructing  a  system  of  linked  lists  in  the  Tag 
Table,  the  parsing  procedure  can  store  in  Word  4  of  the  entry  for  each  search 
term  selected  information  for  use  by  the  scheduling  routines.  At  present,  the 
length  of  the  search  term  in  bytes  is  stored  in  this  location. 

The  complete  Tag  Table  for  Expression  3.1,  as  it  would  appear  at  the 
end  of  the  parsing  procedure,  is  shown  in  Table  3.3.  In  this  table,  A(X) 
represents  the  STRTXT  address  of  search  term  "X";  and  L(X)  represents  the  length 
of  term  "X".  For  the  purpose  of  illustrating  pointers,  Tag  Table  entries  are 
numbered  "Tn",  where  n  is  an  integer;  and  the  four  words  within  an  entry  are 
labelled  A,  B,  C  and  D.  Hence,  the  notation  "T6A"  refers  to  the  first  word  of 
Tag  Table  entry  6. 
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Table  3.3.  Tag  Table  Entries  for 
Parsing  Procedure 


Expression  3.1  after 


A 

B 

C 

D 

Tl 

:0100: 

0 

0 

T2A 

T2 

:0200: 

A(E) 

T3A 

L(E) 

T3 

:0200: 

A(C) 

T4A 

L(C) 

T4 

:0200: 

0 

T20A 

T5A 

T5 

:0300: 

A(D) 

T6A 

L(D) 

T6 

1    :0301: 

0 

T9A 

T7A 

T7 

:0400: 

0 

0 

T8A 

T8 

:0500: 

A(F) 

0 

L(F) 

T9 

:0301: 

A(Y) 

T10A 

L(Y) 

T10 

:0301: 

A(A) 

T11A 

L(A) 

Til 

:0301: 

0 

T14A 

T12A 

T12 

:0400: 

A(B) 

T13A 

L(B) 

T13 

:0400: 

A(Z) 

0 

L(Z) 

T14 

:0302: 

A(G) 

T15A 

L(G) 

T15 

:0302: 

0 

0 

T16A 

T16 

:0400: 

A(H) 

T17A 

L(H) 

T17 

:0401: 

A(I) 

T18A 

L(D 

T18 

:0401: 

A(J) 

T19A 

L(J) 

T19 

:0402: 

A(K) 

0 

L(K) 

T20 

:0201: 

A(L) 

T21A 

L(L) 

T21 

:0201: 

A(M) 

0 

L(M) 

T22 

iOOOO: 

— 

— 

— 
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3.3  The  STRTXT  Table 

Character  strings  specified  by  the  user  as  search  terms  are  stored 
in  the  STRTXT  table.  Search  terms  are  stored  in  one  continuous  string  with 
the  character  :80:  serving  as  a  separator  and  as  a  stop  character  for  search 
operations.  The  first  two  words  of  the  table  contain,  respectively,  the  address 
of  the  first  available  byte  and  the  address  of  the  last  byte  reserved  for  data 
storage. 

3.4  Processing  Procedures 

3.4.1  Tag  Assignment  --  The  Level  Table 

Processing  of  the  current  input  element,  including  tag  assignment, 
is  carried  on  by  means  of  the  two  variables,  CTAG  and  LINK,  and  a  thirteen-level 
stack  (LOGICLVL)  to  be  called  the  Level  Table.  CTAG  contains  the  tag  to  be 
assigned  to  the  next  entity  encountered  at  the  present  level  and  entry.  LINK 
contains  the  address  of  the  link  field  (Word  3)  of  the  last  Tag  Table  entity 
at  the  present  level  and  entry.  The  Level  Table  provides  a  means  of  restoring 
the  values  of  CTAG  and  LINK  that  were  effective  at  a  particular  level  when 
processing  at  that  level  was  suspended  by  the  detection  of  a  left  parenthesis. 
The  actions  taken  by  the  processing  procedure  for  each  kind  of  input  element 
are  summarized  in  Table  3.4. 
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Table  3.4.  Parsing  Procedures 


INPUT  ELEMENT 

ACTION 

Left  Parenthesis 

1.  Create  Tag  Table  entry  for  parenthesized  ex- 
pression. 

2.  Enter  address  of  new  Tag  Table  entry  at 
location  shown  in  LINK. 

3.  Enter  CTAG  and  LINK  on  top  of  Level  Table  stack. 

4.  Generate  new  CTAG  by  adding  1  to  previous  value 
of  level  field  and  clearing  disjunction  field. 

Right  Parenthesis 

Restore  values  of  CTAG  and  LINK  from  top  of 
Level  Table  stack. 

Search  Term 

1.  Create  Tag  Table  entry  for  search  term. 

2.  Enter  address  of  new  Tag  Table  entry  at  the 
location  shown  in  LINK. 

3.  Move  text  of  search  term  to  STRTXT  Table. 

Operator  "+" 

Increment  disjunction  field  in  CTAG. 

Operator  "*" 

Continue  (no  action  required) 

Carriage  Return 

Terminate  Tag  Table  with  :0000:  in  first 
unused  tag  field. 

"IN  " 

Stop  parsing  logical  expression;  prepare  to 
process  an  "IN"  clause. 

Blank 

Skip. 
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3.4.2  Search  Context  Restriction  --  The  "IN"  Clause 

After  the  terminator  "IN J1  has  been  identified,  the  parsing  routine 
locates  the  ^requested  context  name  on  the  input  line,  performs  a  standard 
(sequential)  look-up  to  locate  the  term  in  the  CTXT  table,  and  moves  the 
appropriate  context  delimiters  to  Character  Registers  0  and  1,  which  are 
reserved  for  this  purpose. 

Context  delimiters  are  located  in  the  low-order  bytes  of  the  first 
two  words  following  the  keyword  end  symbol  (:82:).  For  example,  :D3:  and 
:D4:  are  the  beginning  and  end  delimiters,  respectively,  for  TITLE,  which  has 
the  following  entry  in  the  CTXT  Table: 

.  .  .  : 81 : "TITLE": 8200 ::8CD3::8CD4:  ... 

3.4.3  Syntax  Checking 

Syntax  checking  in  the  FIND  decode  routine  consists  mainly  of 
identifying  each  new  element  in  the  input  stream  and  determining  whether  or  not 
this  element  is  a  legal  successor  for  the  previous  element. 

Consider  first  the  rules  of  succession,  as  shown  in  Table  3.5.  Each 
line  in  the  table  consists  of  four  parts:  an  input  element  type,  E;  its 
Identification  Code;  a  list  of  elements  which  may  legally  follow  E;  and  a  Legal 
Successor  Code.  The  Identification  Code  designates  a  particular  bit  assigned  to 
E  from  a  16-bit  computer  word.  The  Legal  Successor  Code  is  simply  the  disjunction 
of  the  Identification  Codes  for  those  elements  which  may  legally  follow  E. 

In  order  to  test  for  legal  succession  using  this  system  of  codes,  the 
parsing  routine  need  only  "AND"  together  the  Legal  Successor  Code  of  one 
element  and  the  Identification  Code  of  its  successor.  If  the  result  is  non- 
zero the  succession  is  legal;  otherwise  it  is  not. 

In  the  present  implementation,  element  identification  and  validity 
testing  are  accomplished  by  means  of  two  tables,  CHARS  and  LEGALTAB.  CHARS 
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consists  of  a  list  of  initial  characters  by  which  elements  may  be  identified: 

_ »(.).+»*.' ,:0D0D:, A, B,C, . . . ,X,Y,Z, 0,1, 2,..., 7 ,8,9,: 8080: 

t 

where  "_"  represents  a  blank,  :0D:  is  a  carriage  return  and  :80:  marks  the  end 
of  the  table.  LEGALTAB  contains  one  three-word  entry  for  each  type  of  element: 

WORD  1.    Transfer  address  to  processing  routine 
for  this  element. 

WORD  2.     Identification  Code  for  this  element. 

WORD  3.    Legal  Successor  Code  for  this  element. 

When  the  first  character  of  a  new  element  has  been  found,  an  attempt 
is  made,  using  a  Search  Forward  (SEARCHF)  instruction,  to  locate  either  that 
first  character  or  an  :80:  in  the  CHARS  Table.  After  execution  of  this  in- 
struction, the  number  of  characters  examined  (minus  1)  is  stored  automatically 
in  Counter  Register  0  and  can  be  used  to  identify  the  character.  If  the  symbol 
is  a  "special  character"  (1  <_  CTRO  £6),  then  CTRO  is  used  directly  as  an  index 
into  LEGALTAB.  Otherwise  further  searching  in  other  tables  must  be  performed 
to  identify  the  new  element  as  a  legal  context  name,  the  word  "IN"  or  an 
illegal  alphanumeric  string. 

After  the  new  element  has  been  identified,  its  Identification  Code  is 
compared  with  the  Legal  Successor  Code  for  the  previous  element  and,  if  the 
succession  is  legal,  control  is  transferred  to  the  appropriate  processing 
routine.  All  data  necessary  for  this  procedure,  including  the  transfer  address, 
is  obtained  from  LEGALTAB. 

Other  error  checking  procedures  consist  mainly  of  testing  for  overflow 
in  the  various  tables  and  tag  fields.  The  Level  Field  in  the  variable  CTAG 
provides  a  convenient  counter  for  detecting  unmatched  parentheses:  a  right 
parenthesis  detected  while  the  level  of  CTAG  <  3  is  unmatched,  and  some  left 
parenthesis  is  unmatched  if  an  end  symbol  is  detected  when  the  level  of  CTAG  f   2. 
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4.0  Search  Scheduling 

4.1  Scheduling  Criteria 

As  discussed  previously,  an  important  reason  for  building  the  present 
retrieval  system  is  to  investigate  algorithms  for  predicting  the  most  efficient 
order  of  search  among  several  terms  in  a  complicated  inquiry.  While  the  ex- 
perimental system  employs  a  direct  sequential  search  to  locate  responding 
documents,  the  results  of  this  study  should  be  applicable  in  an  inverted  file 
system  as  well,  where  scheduling  procedures  can  be  used  to  reduce  term  co- 
ordination time  by  "controlling"  the  lengths  of  the  intermediate  postings  lists 
which  develop  during  the  coordination  procedure. 

The  basic  idea  is  that  in  a  search  for  'TERM  A1  and  'TERM  B'  together 
in  a  restricted  context,  whichever  term  is  less  likely  to  occur  should  be  sought 
first,  since  then  more  context  units  can  be  rejected  after  a  single  scan  because 
they  do  not  contain  that  first  term.  A  corresponding  statement  applies  to  a 
search  for  'TERM  A'  or  'TERM  B'. 

The  problem  lies  in  finding  some  reasonably  reliable  yet  simple  way 
of  determining  relative  probabilities  of  occurrence  for  individual  character 
strings  and  for  the  arbitrary  combinations  of  strings  which  may  occur  in 
complicated  search  requests. 

This  problem  would  be  partially  solved  if  one  knew  in  advance  the 
probability  of  occurrence  of  each  legal  search  term  in  any  given  context.  This 
condition  is  approached  in  some  inverted  file  systems,  where  one  may  know  how 
many  citations  are  associated  with  each  search  term  before  the  required  processing 
begins.  However,  these  frequency  counts  are  usually  associated  only  with  com- 
plete individual  words.  The  construction,  maintenance  and  use  of  frequency  tables 
for  word  fragments  or  arbitrary  character  strings  would  be  yery   difficult,  or 
perhaps  impossible,  even  for  a  small  data  base. 
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Alternative  indicators  of  frequency  of  occurrence  to  be  investigated 
include  the  number  of  characters  in  the  input  string,  the  least  common  bigram 
or  trigram  (two  or  three  character  sequence)  in  the  input  string  (i.e.,  the 
bigram  or  trigram  in  the  input  string  which  occurs  least  frequently  in  the  data 
base),  and  the  least  common  initial  bigram  or  trigram.  The  easiest  of  these 
systems  to  implement  and  the  one  which  is  used  in  the  current  version  of  the 
program  is  one  based  on  the  length  of  the  input  string.  An  analysis  of  word 
frequencies  in  thirty-three  documents  in  our  experimental  data  base  has  shown 
that  with  the  exception  of  lengths  one  and  six,  the  frequency  of  occurrence  of 
a  word  is  a  decreasing  function  of  its  length.  It  is  reasonable  to  expect  a 
similar  trend  among  arbitrary  character  strings. 

The  second  part  of  the  problem--the  scheduling  of  arbitrary  combinations 
of  strings  and  parenthesized  expressions  which  may  occur  in  a  complicated  search 
request--is  a  subject  for  experimentation.  The  rules  which  have  been  adapted 
initially  will  be  explained  below. 

4.2  Changes  in  Scheduling  Procedures 

Because  the  search  scheduling  procedure  is  to  be  a  topic  for  ex- 
perimentation, the  retrieval  program  has  been  designed  to  permit  easy  substi- 
tution of  one  algorithm  for  another.  After  completion  of  the  FIND  decoding 
process,  the  TAG  Table  as  it  appears  in  Table  3.3  is  passed  to  the  subroutine 
SHALG  for  scheduling.  Changing  the  scheduling  algorithm  consists  of  providing 
a  new  version  of  SHALG. 

Two  types  of  scheduling  changes  are  possible:  changes  in  the  method 
for  determining  the  relative  frequencies  of  the  search  terms  and  changes  in 
policy  concerning  the  scheduling  of  subexpressions  within  a  complicated  search 
request.  Changes  of  the  first  type  can  be  accomplished  simply  by  adding  to  the 
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existing  routine  a  preprocessing  step  which  changes  the  contents  of  Word  4  of 
the  Tag  Table  entry  for  each  search  term.  This  word  is  reserved  for  an  index 
which  reflects  the  expected  relative  frequency  of  the  term.  In  the  present 
implementation,  large  index  values  correspond  to  low  frequencies,  and  the 
index  in  use  is  the  length  of  the  term  in  bytes.  If  scheduling  were  to  be 
based,  instead,  on  least  common  bigrams,  one  could  list  all  bigrams  in  the 
data  base  according  to  frequency  from  most  frequent  to  least,  determine  which 
bigram  in  the  input  string  lay  closest  to  the  end  of  the  list,  and  use  its 
position  in  the  list  as  the  index  for  the  term.  No  other  changes  in  procedure 
would  be  required. 

Changes  of  the  second  type,  basic  scheduling  policy,  require  substitution 
of  a  new  version  of  SHALG. 

4.3  Success  and  Failure  Linkage 

Search  scheduling  and  control  are  accomplished  by  means  of  two  systems 
of  pointers,  called  success  and  failure  links,  in  columns  C  and  D  of  the  Tag 
Table.  Consider  the  search  request 

FIND  A*B*C  +  D*E*F,  (4.1) 

and  suppose  that  the  search  terms  were  processed  in  their  original  order  from 
left  to  right.  The  first  step  of  the  procedure  would  be  to  scan  the  text  for 
term  A.  If  A  were  found,  then  a  search  for  B  would  be  conducted.  However,  if 
the  search  for  A  failed,  searching  for  either  B  or  C  would  be  unnecessary,  and 
processing  could  continue  with  term  D.  Thus,  the  success  link  for  A  would  be 
B;  and  the  failure  link  for  A  would  be  D.  Similarly,  the  success  link  for  B 
would  be  C,  and  the  failure  link  for  B  would  be  D.  The  complete  system  of 
success  and  failure  pointers  for  this  example  is  shown  in  Table  4.1.  Notice 
that  each  of  a  series  of  entities  connected  by  the  operator  "*"  has  the  same 
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failure  link  while  each  of  a  series  of  entities  joined  by  the  operator  "+" 
has  the  same  success  link.  Eventually  every  path  through  this  structure  leads 
to  the  condition  of  overall  success  or  failure  for  the  search. 

The  process  of  constructing  this  linkage  is  referred  to  as  search 
scheduling. 


Table  4.1.  Success  and  Failure  Linkage  for  Expression  4.1 


TERM 

SUCCESS  LINK 

FAILURE  LINK 

A 

B 

D 

B 

C 

D 

C 

SUCCESS 

D 

D 

E 

FAILURE 

E 

F 

FAILURE 

F 

SUCCESS 

FAILURE 

4.4  Final  Organization  of  Tag  Table 

The  following  conventions  are  employed  by  the  retrieval  program  in 
constructing  success/failure  linkage  in  the  Tag  Table: 

1.  Both  success  and  failure  pointers  for  a  given  term  point  to  the 
Tag  Table  success  column  for  the  next  term  to  be  processed. 

2.  Ultimate  success  or  failure  is  indicated  by  the  entry  :FFFF:  in 
the  appropriate  link. 

3.  Both  pointers  associated  with  the  Tag  Table  entry  for  a 
parenthesized  expression  point  to  the  first  entity  to  be  proces- 
sed inside  the  parentheses.  The  first  entry  in  the  table,  which 
represents  the  entire  search  request,  obeys  this  rule  except  that 
its  success  pointer  is  set  to  :0000:. 
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4.  The  success  and  failure  links  out  of  a  parenthesized  expression 
are  recorded  with  the  last  term(s)  to  be  processed  inside  the 
parentheses. 
Recall  that  after  the  parsing  procedure,  all  entities  on  a  given  linked 
list  in  the  tag  table  which  are  logically  connected  by  the  operator  "*"  have  the 
same  tag,  and  entities  joined  by  "+"  have  tags  with  the  same  level  value  but 
with  different  values  in  the  disjunction  field.  On  the  assumption  that  progres- 
sively longer  or  more  complicated  expressions  will  be  progressively  less  likely 
to  occur,  the  scheduling  procedure  arranges  the  entities  on  each  list  for 
consideration  in  the  following  order: 

1.  Tags  which  represent  strings  only,  but  no  parenthesized  ex- 
pressions (Group  I  tags)  are  considered  first. 

2.  Tags  in  Group  I  which  represent  single  strings  are  arranged  in 
order  from  shortest  string  (most  likely)  to  longest  (least 
likely).  ' 

3.  Tags  in  Group  I  which  represent  multiple  strings  are  arranged 
in  order  from  fewest  strings  to  most. 

4.  Strings  associated  with  each  tag  in  step  3  are  considered  in 
order  from  longest  to  shortest. 

5.  Tags  which  represent  parenthesized  expressions  and  possibly 
individual  strings  as  well  (Group  II  tags)  are  arranged  in  order 
first  from  fewest  strings  to  most  and  then  from  fewest  parenthe- 
sized expressions  to  most. 

6.  Strings  associated  with  each  tag  in  step  5  are  arranged  in  order 
from  longest  to  shortest. 

7.  Parenthesized  expressions  associated  with  each  tag  in  step  5  are 
arranged  in  order  from  fewest  to  most  enclosed  search  strings. 
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Linkage  between  lists  results  from  entry  into  or  exit  from  a  parenthesized 
expression.       *• 

These  rules  are  illustrated  in  Table  4.2,  which  shows  the  complete 
Tag  Table  for  the  example  of  Expression  3.1,  repeated  for  convenience  as  ex- 
pression 4.2.  The  lengths  assumed  for  the  various  terms  are  given  in  paren- 
theses below  the  terms  in  4.2.  Pointers  in  the  table  are  interpreted  as  for 
Table  3.3,  where  a  pointer  to  "T21C"  refers  to  column  C  of  entry  number  21. 


FIND    E  *  C  *  (1  D  +  (2  (3  F  )3  )2  *  Y  *  A 


(7)   (2) 


(4) 


(9) 


(7)   (5) 


B*Z)4  +  G*(5H  +  I*J  + 
(6)   (8)      (3)      (6)   (4)   (7) 


(4.2) 


K  )5  )1  +  L  *  M 


(8) 


(5)   (8) 


Table  4.2.  Complete  Tag  Table  for  Expression  4.2 


Tl 


0100 


0000: 


T21C 


T2 


0200 


A  E 


T3C 


:FFFF 


T3 

T4 

T5 

T6 

T7 

T8 

T9 

TIP 

Til 

T12 

T13 

T14 

T15 

T16 

T17 

T18 

111 
T20 

T21 

T22 


0200 

0200 

0300 

0301 

0400 

0500 

0301 

0301 

0301 

0400 

0400 

0302 

0302 

0400 

0401 

0401 

0402 

0201 

0201 

0000 


AC 


mi 


A(F 
"ATT 

AE 

0 

"ATeT 

"ACT 

Mcf 

o 

"Apr 

"ATT 

aE 

mm" 


T4C 

T5C 

:FFFF: 

T7C 

T8C 

T11C 

T10C 

T6C 

T13C 

:FFFF: 

T12C 

T15C 

T16C 

:FFFF: 

:FFFF 

T17C 

:FFFF 

:FFFF 

T20C 


:FFFF 

T5C 

T14C 

T7C 

T8C 

:FFFF 

:FFFF 

:FFFF 

T13C 

:FFFF 

:FFFF 

T9C 

T16C 

T19C 

T9C 

T9C 

T18C 

T2C 

T2C 
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To  use  Table  4.2,  consult  first  column  D  of  entry  Tl ,  which  contains 
a  pointer  to  the  first  term  to  be  located  (M),  then  proceed  as  explained  in 
the  text. 

4.5  Processing  Procedures 

Most  of  the  processing  in  SHALG  consists  of  selecting  various  groups 
of  tags  and  sorting  them  into  the  desired  order.  After  this  process  has  been 
completed,  the  Tag  Table  appears  much  as  it  did  upon  entry  to  the  routine  except 
that  the  order  of  the  elements  on  the  various  linked  lists  has  been  changed.  It 
is  now  time  to  make  the  SUCCESS/FAILURE  assignments. 

Success  and  failure  assignments  proceed  in  a  straightforward  manner 
according  to  the  principles  in  the  previous  section  and  the  following  rules: 

1.  The  success  link  for  the  first  entry  in  the  Tag  Table  is  set 
to  :0000:;  the  failure  link  points  to  the  first  entity  to  be 
processed. 

2.  If  the  current  entity  is  the  last  element  on  a  chain  having  a 
particular  tag,  then  SUCCESS  means  success  for  the  chain;  other- 
wise the  success  link  points  to  the  next  element  on  the  chain 
(which  necessarily  has  the  same  tag  as  the  current  element). 

3.  If  some  other  entry  X,  further  down  the  chain  has  a  different 
tag  from  that  of  the  current  entity,  then  the  failure  link  points 
to  the  first  such  X;  otherwise,  FAILURE  means  failure  for  the 
chain. 

4.  By  virtue  of  the  way  the  chains  are  constructed,  only  one  chain 
will  ever  occur  at  level  2,  and  success  or  failure  on  that  chain 
implies  success  or  failure  for  the  search  as  a  whole.  (Success 
or  failure  for  a  higher  level  chain,  on  the  other  hand,  implies 
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success  or  failure  in  satisfying  the  requirements  of  a  parenthe- 
sized expression.) 

5.  When  a  tag  representing  a  parenthesized  expression  is  encountered, 
the  success  and  failure  links  are  determined  in  the  standard  way 
and  saved  in  a  stack  along  with  the  link  to  the  next  element  on 
the  current  chain.  Processing  of  the  current  chain  is  suspended, 
and  success/failure  assignments  are  completed  for  the  terms  inside 
the  parenthesized  expression.  "Ultimate"  success  and  failure  links 
for  the  parenthesized  quantity  are  obtained  from  the  next  lower 
level  in  the  stack. 

6.  When  all  terms  inside  a  parenthesized  expression  have  been  processed, 
the  link  is  recovered  from  the  top  of  the  stack,  and  processing  of 
the  next  lower  level  chain  continues. 

7.  The  process  terminates  when  the  end  of  the  level  2  chain  is  detected. 
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5.0  PRINT  Statement  Processing 

PRINT  statement  processing  consists  of  locating  the  requested  context 
names  in  the  CTXT  Table  and  marking  the  entries  appropriately.  The  data  portion 
of  an  entry  in  the  CTXT  Table  consists  of  4  bytes  of  the  form 

8X  YY  8C  11 

where  YY  and  11   are  the  beginning  and  end  delimiters,  respectively,  for  the  as- 
sociated context;  and  X  is  either  C  or  D  depending  on  whether  or  not  printing 
of  the  associated  context  has  been  requested.  Therefore,  the  PRINT  statement 
processor  first  "clears"  the  CTXT  Table,  replacing  each  X  with  C.  It  then 
isolates  context  names  in  the  input  line  and  changes  the  X's  in  the  corresponding 
data  fields  from  C's  to  D's. 
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6.0  Search  Control 

6.1  Detailed  Data  Base  Structure 

Figure  2.1  in  Part  I  of  this  report  describes  the  hierarchical  structure 
of  documents  in  the  data  base.  Table  6.1  of  this  section  lists  the  section  names 
together  with  the  hexadecimal  start  and  end  characters  for  each  document  sub- 
division. Note  the  new  division,  "DIRECTORY",  which  has  been  inserted  ahead  of 
"DOCUMENT".  This  is  a  non-searchable  context  directory  used  by  the  control  pro- 
grams to  facilitate  searching. 

The  arrangement  of  the  various  context  sections  and  delimiters  as  they 
would  appear  in  storage  is  illustrated  in  Figure  6.1.  Three  different  types  of 
context  delimiters  (corresponding  to  prefix  characters  "C",  "D",  and  "E")  are 
available. 

6.1.1  C- Deli miters 

C-delimiters  are  used  to  identify  major  sections  of  a  document  such  as 
ABSTRACT,  bibliographic  DATA,  etc.  Each  of  these  must  occur  once  and  only  once 
in  any  given  document,  even  if  some  of  the  sections  they  identify  are  absent. 
If,  for  example,  a  document  contains  no  entries  under  INDEX  or  KEYS,  then  these 
sections  should  be  represented  by  their  delimiters  alone,  as  follows: 

...  DCC2C3C4ECEE  ...   . 

C-delimiters  must  appear  within  a  document  in  increasing  numerical  order  from 
CO  through  C9. 

6.1.2  D-Delimiters 

D-delimiters  identify  individual    items  of  bibliographic  data,  such  as 
titles  or  authors'   names,  which  are  to  be  available  for  direct  searching.     They 


Table  6.1.  Context  Delimiting  Characters 
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Section  Name 

Context  Delimiters 

(Hexadecimal) 

Start 

End 

DIRECTORY 

CO 

CI 

DOCUMENT,  ARTICLE 

CI 

C9 

DATA 

CI 

C2 

AUTHOR 

Dl 

D2 

TITLE 

D3 

D4 

SOURCE 

D5 

D6     ! 

DATE 

D7 

D8 

PAGE,  PAGES 

D9 

DA 

MISC 

DB 

DC 

INDEX 

C2 

C3 

KEYS 

C3 

C4 

TEXT 

C4 

C7 

ABSTRACT 

C4 

C5 

BODY 

C5 

C6 

NOTES 

C6 

C7 

REFERENCES,  REFS 

C7 

C8 

COMMENTS 

C8 

C9 

PARAGRAPH 

EC 

EC 

SENTENCE 

EE 

EE 
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OOCO  ...  (directory  data  —  10  words)  .  .  .  ECEE880000000000 
C1D1  ...  (author)  .  .  .  D2D3  .  .  .  (title)  .  .  .  D4D5  .  .  . 
source)  .  .  .  D6D7  .  .  .  (date)  .  .  .  D8D9  .  .  .  (pages)  .  . 
.  .  DADB  .  .  .  (miscellaneous  bibliographic  data)  .  .  .DC 
C2  .  .  .  (index)  .  .  .  C3  .  .  .  (keys)  .  .  .  C4ECEE  


abstract)  .  .  .  EE  .  .  .  EE  .  . 
.  (body)  .  .  .  EE  .  .  .  EE  .  . 
.  .  (notes)  .  .  .  EE  .  .  .  EE 
C7EC  .  .  .  (references)  .  .  .  EE 
.  .  EEC8EC  .  .  .  (comments)  . 
EE  .  .  .  EEECC9 


EEEC EEC5EC  . 

EEEC EEC6EC 

,  .  EEEC  ........  EE 

.  .  EE  .  .  .  EEEC  

.  EE  .  .  .  EE  .  .  .  EEEC  .  . 


Figure  6.1.  Document  Organization  in  Storage 
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make  it  possible  to  search  rapidly  for  all  items  by  a  particular  author  or 
from  a  particular  journal,  or  from  selected  years.  As  with  C-delimiters,  each 
D-delimiter  must  occur  once  and  only  once  within  each  document.  However,  strict 
numerical  order  need  not  be  maintained,  i.e.,  individual  items  of  bibliographic 
information  need  not  appear  in  any  fixed  order. 

6.1.3  E-Delimiters 

E-delimiters  separate  repeated  elements  of  text,  such  as  paragraphs 
and  sentences,  which  are  to  be  separately  searchable.  One  sentence  delimiter 
(EE)  appears  at  the  beginning  and  one  at  the  end  of  each  sentence  in  the  text. 
Only  one  delimiter  need  appear  between  two  sentences.  Similar  statements  hold 
for  the  paragraph  delimiter  (EC).  Each  major  document  section  which  can  contain 
E-type  subdivisions  (see  Figure  6.1)  must  contain  at  least  one  paragraph  symbol 
and  one  sentence  symbol,  even  if  no  other  text  is  present.  Hence,  if  a  certain 
document  contained  no  reference  list  and  no  comments,  these  sections  would  appear 
as  follows: 

...  C7EEECC8EEECC9  . 

There  is  no  theoretical  restriction  on  the  length  of  any  document 
section  or  subsection  in  this  system;  however,  as  a  matter  of  convenience  for 
the  current  implementation,  it  is  assumed  that  the  entire  bibliographic  data 
section  can  be  contained  in  core  at  once  for  searching.  This  limits  that  one 
section  to  a  total  length  of  approximately  4000  characters. 

6.2  Directory  and  Control  Data 

The  first  line  of  Figure  6.1  shows  several  words  of  control 
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information  stored  at  the  beginning  of  a  document  between  the  delimiters 
CO  and  CI.  This  information,  which  is  generated  when  a  document  is  added  to 
the  data  base,  consists  mainly  of  pointers  to  the  disk  addresses  at  which  the 
various  major  sections  of  the  document  begin.  The  control  programs  use  these 
pointers  to  locate  the  beginnings  of  major  sections  of  text  without  having  to 
search  sequentially  from  the  beginning  of  the  document.  This  directory  can 
also  be  used  to  locate  the  "next"  document  in  the  file  without  reference  to 
independent  system  tables,  thus  reducing  requirements  for  core  storage  (a 
limited  resource)  or  for  disk  access. 

Table  6.2  shows  the  contents  of  the  10  words  in  the  document  directory 
The  first  eight  of  these  words  contain  the  addresses  of  the  disk  sectors  in 
which  the  eight  major  document  sections  begin.  It  is  important  to  note  that 
directory  pointers  do  not  indicate  the  exact  location  of  the  first  character  of 
a  document  section,  but  only  the  address  of  the  disk  sector  which  contains  that 
first  character.  A  sector  is  the  smallest  addressable  unit  of  data  on  the  disk 
and,  in  the  present  system,  contains  about  900  characters. 

Directory  words  nine  and  ten  contain  sector  addresses,  respectively, 
for  the  current  end  of  the  document  and  for  the  end  of  the  disk  space  reserved 
for  the  document.  Typically  several  unused  sectors  are  reserved  at  the  end  of 
a  document  for  storing  users'  comments.  When  comments  are  present,  they  will 
constitute  a  searchable  field  and  will  affect  the  value  of  the  end  of  text 
pointer. 

The  four  words  which  follow  the  directory  (see  Figure  6.1)  contain 
"stop"  characters  for  certain  searches  which  proceed  in  the  reverse  direction 
and  empty  space  for  use  by  the  routines  which  format  text  for  printing. 

In  addition  to  the  14  words  of  directory  and  control  data  stored  with 
each  document,  twelve  locations  in  core  are   permanently  reserved  for  control 
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Table  6.2.  Directory  Contents 

WORD  1  -  Disk  sector  address  of  start  of  DATA  section 

WORD  2  -  Disk  sector  address  of  start  of  INDEX  section 

WORD  3  -  Disk  sector  address  of  start  of  KEYS  section 

WORD  4  -  Disk  sector  address  of  start  of  ABSTRACT  section 

WORD  5  -  Disk  sector  address  of  start  of  BODY  section 

WORD  6  -  Disk  sector  address  of  start  of  NOTES  section 

WORD  7  -  Disk  sector  address  of  start  of  REFERENCES  section 

WORD  8  -  Disk  sector  address  of  start  of  COMMENTS  section 

WORD  9  -  Disk  sector  address  of  the  current  end  of  text 

WORD  10  -  Disk  sector  address  of  the  end  of  allocated  disk 
space  for  this  document 


purposes:  ten  consecutive  locations  labelled  CNTRL00--CNTRL09  immediately  pre- 
ceding the  buffer  space  used  for  document  text,  and  two  other  locations  labelled 
DISKLIM1  and  DISKLIM2,  which  contain,  respectively,  the  disk  sector  addresses 
of  the  beginning  and  end  of  the  data  base.  All  disk  space  between  these  two  ad- 
dresses is  assumed  by  the  program  to  be  allocated  to  document  text,  although  it 
need  not  all  be  filled.  The  allocation  of  disk  space  to  individual  documents  is 
controlled  by  directory  information  in  the  affected  documents.  No  master  disk 
directory  is  required  by  the  retrieval  program  although  it  will  probably  become 
desirable  to  implement  one  at  some  later  date. 

The  use  of  control  words  CNTRL00--09  is  defined  in  Table  6.3.  They 
provide  a  means  for  establishing  correspondence  between  the  physical  locations 
of  disk  sectors  in  core  and  the  disk  sector  addresses  listed  in  the  document 
directories. 


51 


Words  CNTRL01  and  CNTRL02  are  loaded  by  the  S-Level  program  before 
executing  a  disk  read  instruction.  The  remainder  of  the  information  is  supplied 
by  the  microprogram  as  the  read  proceeds. 


Table  6.3.  Control  Word  Assignments 


CNTRLOO 

CNTRL01 
CNTRL02 

CNTRL03 


Byte  address  of  end-of-buffer  character  (:88:) 

supplied  by  microcode  after  disk  read 

Disk  address  of  first  sector  in  core 

Byte  address  of  first  character  from  first  sector 
read  into  S-Memory 

Byte  address  of  first  character  from  second  sector 
read  into  S-Memory 


CNTRL09 


Byte  address  of  first  character  from  eighth  sector 
read  into  S-Memory 


6.3  Search  Control  Procedures 

Several  technical  problems  arise  from  the  extreme  variation  which 
exists  in  the  lengths  of  the  context  units  to  be  considered  and  from  the  fixed, 
somewhat  limited  space  available  in  memory.  In  some  cases  arbitrary  design  de- 
cisions, hopefully  in  accord  with  the  specifications  stated  in  Section  1,  have 
been  required;  and  these  are  always  potentially  subject  to  revision. 

6.3.1  Search  Types 

First,  consider  the  search  request 

FIND    'TERM  A'  *  'TERM  B'  +  'TERM  C  *  'TERM  D'  IN  ARTICLE,        (6.1) 


and  suppose  that  the  terms  have  been  scheduled  in  the  manner  shown  in  Table  6.4, 
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Suppose  further  that  some  document,  K,  contains  enough  text  to  fill  the  available 

memory  N  times,  where  N  >  1. 


Table  6.4.  Search  Schedule  for  Expression  6.1 


TERM 

SUCCESS  POINTER 

FAILURE  POINTER   ! 

START 

:0000: 

TERM  A 

TERM  A 

TERM  B 

TERM  C 

TERM  B 

SUCCESS 

TERM  C 

TERM  C 

TERM  D 

FAILURE 

TERM  D 

SUCCESS 

FAILURE 

The  search  scheduling  procedure  described  in  Section  4  requires  an 
initial  search  for  TERM  A  to  continue  until  some  occurrence  of  TERM  A  has  been 
found  or  until  the  entire  document  has  been  scanned.  If  TERM  A  appears  in 
Document  K,  then  the  process  must  be  repeated  for  TERM  B,  etc.,  until  the  success 
or  failure  of  the  search  has  finally  been  established.  This  may  require  each 
memory  load  of  text  in  Document  K  to  be  read  from  disk  several  times  in  the 
course  of  the  search. 

In  order  to  avoid  excessive  disk  handling,  an  alternative  procedure 
has  been  adopted  for  searching  major  context  sections,  i.e.,  context  units 
identified  by  C-type  delimiters  (Section  6.1.1).  Each  buffer  is  searched  in 
turn  for  all  terms  in  the  search  request  and,  when  a  string  is  located,  the 
high  order  bit  of  its  tag  is  set  to  "1"  in  the  TAG  Table.  After  a  buffer  has 
been  completely  processed  in  this  way,  the  tags  are  used  to  determine  whether  or 
not  the  logical  requirements  of  the  search  request  have  been  satisfied.  If  not, 
more  text  is  read  from  disk  and  the  search  continues  for  those  strings  not  yet 
found. 

Now  consider  the  example  of  6.1  and  Table  6.4  with  the  search  context 
changed  to  PARAGRAPH.  Normally  several  complete  paragraphs  will  fit  into  the 
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buffer  space  available;  and  it  becomes  quite  reasonable  to  conduct  the  search  in 

if 

"scheduled  order"  on,  say,  a  paragraph  by  paragraph  basis,  thus  avoiding  some 
unnecessary  searching. 

When  the  search  context  is  too  short,  a  potential  new  source  of  un- 
desirable overhead  appears,  namely,  that  associated  with  setting  up  a  large  number 
of  unsuccessful  searches  in  short  context  units.  In  such  cases,  an  initial 
"global"  scan  is  employed,  followed  by  a  "local"  search  whenever  a  responding 
context  is  tentatively  identified. 

Because  of  explicit  and  implicit  requirements  for  interaction  between 
the  search  and  print  control  routines,  the  choice  of  specific  "local"  and  "global" 
contexts  depends  on  what  contexts  are  to  be  printed  as  well  as  what  contexts  are 
to  be  searched.  Searches  are  divided  into  four  types,  as  shown  in  Table  6.5. 
The  first  section  of  the  table  defines  the  search  type,  and  the  remainder  lists 
the  search  contexts  which  are  employed  on  three  levels.  Level  0  context  delimiters 
define  the  total  scope  of  the  search  within  each  document;  Level  1  context  de- 
limiters define  the  context  in  which  searching  is  normally  conducted  (the  "global" 
context);  Level  2  context  delimiters  define  the  range  of  the  detailed  searches 
(the  "local"  context)  which  are  sometimes  required. 

6.3.1.1  Type  I  Search 

The  requested  major  context  unit  is  searched  by  memory  loads  for  all 
search  terms  until  the  search  request  is  satisfied  or  the  document  is  rejected. 
Whenever  a  responding  document  is  detected,  the  PRINT  routine  is  called,  and  all 
requested  contexts  in  the  document  are  printed.  Searching  is  resumed  with  the 
next  document  in  the  data  base. 

6.3.1.2  Type  II  Search 

Processing  proceeds  exactly  as  for  a  Type  I  search  except  that  the 
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"scheduled"  search  mode  is  employed  since,  by  a  design  requirement,  the  entire 
Bibliographic  DATA  section  is  known  to  be  in  memory. 

6.3.1.3  Type  HI  Search 

When  SENTENCE  or  PARAGRAPH  is  specified  as  the  FIND  context,  the 
search  is  necessarily  restricted  to  those  major  context  units  which  contain 
sentence  and  paragraph  subdivisions:  TEXT,  REFERENCES  and  COMMENTS.  Hence, 
Table  6.5  lists  a  Level  0  context  extending  from  the  beginning  of  the  abstract 
to  the  end  of  the  document.  For  Type  III  searches,  the  Level  1  context  is  again 
a  full  memory  load,  and  the  Level  2  context  is  the  same  as  the  FIND  context. 

A  Level  1  search  is  conducted  in  "scheduled  order"  through  the  entire 
memory  until  any  search  term  has  been  located.  When  this  occurs,  the  Level  1 
search  is  suspended;  and  a  complete  "scheduled"  search  is  conducted  in  the  re- 
sponding Level  2  context  (sentence  or  paragraph).  If  this  Level  2  search  is 
successful  the  PRINT  routine  is  called,  and  printing  proceeds  as  in  Cases  I  and 
II.  If  the  Level  2  search  is  not  successful,  the  Level  1  search  is  resumed 
exactly  where  it  left  off. 

6.3.1.4  Type  IV  Search 

This  case  differs  from  Type  III  in  that  either  sentence  or  paragraph 
has  been  selected  as  a  PRINT  context,  and  hence  it  will  be  necessary  to  print 
specific  small  context  units.  This  is  accomplished  without  a  great  deal  of 
extra  searching  or  bookkeeping  by  calling  the  output  routines  as  soon  as  such  a 
context  can  be  identified.  Control  passes  back  and  forth  between  the  routines 
which  control  searching  and  those  which  control  printing  in  such  a  way  that  a 
responding  context  unit  is  printed  immediately,  and  searching  is  resumed  in  the 
next  Level  1  or  Level  2  context  unit,  as  appropriate.  This  is  the  only  case  in 
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which  searching  continues  in  a  given  document  after  part  of  that  document  has 
been  printed. 

To  facilitate  this  arrangement,  the  Level  0  context  is  Abstract-End, 
as  before,  but  the  Level  1  search  is  conducted  on  a  paragraph  by  paragraph  basis. 
If  a  paragraph  is  found  to  satisfy  the  search  request,  the  search  routine  con- 
ducts a  sentence  by  sentence  search,  if  necessary,  within  that  responding  para- 
graph. When  a  search  is  completed  successfully,  the  address  of  the  responding 
context  unit  is  passed  to  the  print  control  routines  which  print  the  required 
context  unit  and  return  control,  along  with  the  address  of  the  last  character 
printed.  Searching  then  resumes  in  the  next  sentence  or  paragraph,  as  appro- 
priate. This  processing  is  complicated  somewhat  by  the  fact  that  any  of  the 
four  combinations  of  minor  FIND  and  PRINT  contexts  is  allowed:  PARAGRAPH/ 
PARAGRAPH,  PARAGRAPH/SENTENCE,  SENTENCE/PARAGRAPH  or  SENTENCE/SENTENCE. 

6.3.2  Sentence  and  Paragraph  Restrictions 

A  second  problem  arising  from  the  non-uniform  lengths  of  sentences  and 
paragraphs  is  that  it  is  very  inefficient  to  guarantee  that  every  memory  load  of 
text  begins  at  the  beginning  of  a  paragraph  and  ends  at  the  end  of  one.  In  fact, 
it  is  not  considered  practical  to  make  this  guarantee  even  for  sentences.  Hence, 
it  is  possible  for  a  user  to  request  a  search  for  a  character  string  which  begins 
in  one  buffer  and  ends  in  the  next,  or  to  specify  the  co-occurrence  of  two  terms 
which  actually  appear  together  in  the  required  context  but  which  do  not  lie  in 
the  same  memory  load.  In  such  cases  the  search  would  fail,  and  the  document  would 
not  be  retrieved.  Except  as  explained  under  "Type  I  Search",  no  attempt  is  made 
to  continue  a  particular  search  from  one  memory  buffer  to  another. 

An  effort  has  been  made  to  load  the  disk  in  such  a  way  as  to  minimize 
this  problem.  No  word  is  ever  divided  between  two  sectors  of  the  disk,  and 
sentences  are  so  divided  only  if  it  is  impractical  (in  terms  of  wasted  disk  space) 
or  impossible  to  do  otherwise. 
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7.0  PRINT  Control 

Actual  formatting  and  printing  of  document  text  is  under  the  control 
of  two  routines,  PRNT  and  BLKP,  which  interact  with  the  search  procedures  as 
explained  in  Section  6. 

Whenever  the  print  control  routine  (PRNT)  is  called,  it  first  checks 
certain  flags  to  determine  whether  this  is  the  first  or  last  call  or  an  inter- 
mediate call  for  the  current  document.  An  intermediate  call  indicates  that  a 
Type  IV  search  is  in  progress  and  a  responding  minor  context  has  just  been 
located.  In  that  case,  the  required  paragraph  or  sentence  is  printed,  and  control 
is  returned  to  the  search  procedure. 

When  a  first  or  last  call  is  received,  the  CTXT  table  is  scanned  to 
determine  what  context  should  be  printed  next,  and  a  check  is  made  to  determine 
whether  this  context  has  already  been  printed.  If  it  has,  the  CTXT  table  scan 
is  resumed;  if  not,  print  processing  proceeds.  If  the  next  context  to  be  printed 
is  SENTENCE  or  PARAGRAPH,  processing  begins  at  the  start  of  the  first  major  context 
section  which  contains  these  subdivisions  and  which  has  not  already  been  printed. 

Print  processing  consists  of  breaking  the  text  first  into  paragraphs 
and  then  into  individual  print  lines,  determining  whether  or  not  each  line 
contains  any  of  the  requested  search  terms  and,  if  so,  supplying  the  required 
marginal  characters. 

In  order  to  identify  those  lines  containing  search  terms,  it  is 
necessary  to  scan  the  entire  text  to  be  printed  once  for  each  term  in  the  search 
request.  (Thus,  this  facility  represents  a  fairly  large  penalty  in  terms  of 
processing  time,  program  space  and  program  complexity.)  The  search  is  conducted 
on  a  paragraph  by  paragraph  basis  ;  and  when  a  search  term  is  found,  a  special 


If  the  PRINT  context  contains  no  paragraph  subdivisions,  then  the  search  is 
conducted  in  the  full  PRINT  context,  if  in  core,  or  in  the  full  current  text  buffer, 
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non-printable  character,  :88:,  is  placed  in  the  first  blank  preceding  the  end 
of  the  search  term.  When  all  search  terms  have  been  marked  in  this  way,  the 
routine  BLKP  divides  the  paragraph  into  lines  of  maximum  length  125  characters 
ending  with  blank,  supplies  the  appropriate  marginal  prefix  ('** — '  if  the 
line  contains  a  search  term  or  ' '  if  it  does  not),  and  prints  the  line. 

In  order  to  conserve  memory  space  and  (hopefully)  reduce  overhead, 
print  processing  is  conducted  by  paragraphs  rather  than  by  lines,  and  it  takes 
place  entirely  within  the  original  text  buffer.  Lines  are  not  moved  to  a 
special  location  before  printing. 

After  a  paragraph  has  been  printed,  the  procedure  is  repeated  for  the 
next  paragraph  or  the  CTXT  table  scan  is  resumed  to  locate  the  next  context  for 
printing.  When  all  requested  printing  has  been  completed,  control  is  returned 
to  the  search  routines,  and  processing  begins  in  the  next  document. 
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8.0  Implementation  jof   Negation 

The  user  does  not  presently  have  the  capability  of  specifying  that  a 
term  should  not  be  present  in  a  search  context.  However,  this  facility  can  be 
implemented  easily  as  follows. 

The  second,  third  and  fourth  bits  from  the  left  of  the  tag  word  are 
not  used  for  any  purpose:  one  of  these  could  be  designated  as  the  "negation 
bit".  In  constructing  a  new  tag,  then,  the  parsing  procedure  would  first  set 
this  bit  to  zero,  and  then  reverse  its  state  for  each  occurrence  of  the  "NOT" 
operator  before  the  entity  to  which  the  tag  is  assigned.  The  remainder  of  the 
tag  would  be  constructed  as  before. 

The  rest  of  the  scheduling  and  searching  procedures  could  work 
exactly  as  before  except  that  the  SUCCESS  and  FAILURE  pointers  would  be  inter- 
changed for  any  entity  which  was  negated. 

Including  negation  might  make  it  desirable  to  change  the  scheduling 
procedures.  That  possibility  requires  further  investigation. 
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9.0  System  Errors 

There  are  six  locations  in  the  SRCH  and  PRNT  routines  from  which  errors 
in  programming  or  data  base  formatting  can  cause  the  message  "ERR--SYSTEM",  to 
be  printed.  These  locations  are  listed  below  together  with  a  description  of  the 
condition  which  causes  the  failure.  The  message  itself  does  not,  at  present, 
indicate  which  error  has  occurred. 


STATEMENT  LABEL 


CAUSE  OF  FAILURE 


SRCH11 


SRCH21 


PRNT11 


PRNT15AA 


PRNT17D 


PRNT17M 


Failure  to  find  the  Level  0  start  character  after  the 
sector  in  which  it  occurs  has  just  been  read  from  disk 
or  verified  as  present  in  core. 

Failure  to  find  a  paragraph  symbol  (:EC:)  marking  the 
start  of  the  next  Level  1  search  context  (Type  IV 
search).  A  paragraph  symbol  should  always  be  detected 
even  if  it  follows  the  end-of-buffer  character  (:88:). 

Failure  to  find  a  context  start  symbol  for  searching  by 
PRNT. 

Failure  to  find  a  paragraph  symbol  which  should  occur 
within  a  larger  context  for  searching  by  PRNT. 

Improper  address  calculation  in  preparation  to  read 
disk  sector  containing  start  of  BODY. 

Failure  to  find  start  of  BODY  character  after  verifying 
its  presence  in  core. 
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10.0  S-Level  Debugging  Facilities:  the  DUMP  and  CHANGE  Instructions 

Two  special  instructions,  DUMP  and  CHANGE,  have  been  included  in  the 
initial  implementation  of  the  retrieval  program  to  assist  in  debugging.  These 
instructions  are  intended  for  use  by  system  maintenance  personnel,  and  in  the 
interest  of  economy  of  storage  and  programming  effort,  very  rigid  input  formats 
must  be  observed.  Both  of  these  instructions  should  be  omitted  from  "production" 
versions  of  the  program  since  both  can  cause  alteration  of  the  S-Memory. 

10.1  DUMP  Instruction 

The  DUMP  instruction  causes  the  printing  of  a  hexadecimal  dump  of  a 
selected  portion  of  the  S-Memory.  The  output  contains  16  words  per  line,  with 
the  address  of  the  first  word  on  each  line  divisible  by  16. 

The  input  format  for  the  DUMP  instruction  is 

DUMP_XXXXZZZZ 

where  "DUMP"  must  be  the  first  four  characters  on  the  input  line  and  must  be 
followed  by  exactly  one  blank.  "X"  and  "Z"  in  this  definition  represent  hexa- 
decimal characters  (0-9,  A-F),  and  the  string  "XXXXZZZZ"  contains  the  actual 
object  code  for  the  desired  instruction  (see  appropriate  microcode  documentation). 

10.2  CHANGE  Instruction 

The  CHANGE  instruction  allows  the  user  to  change  the  contents  of 
selected  words  in  S-Memory.  Its  format  is 

CHANGE  J(XXXJL-ZZZZ ZZ 

where  X,  Y  and  Z  are  hexadecimal  characters.  Again,  the  characters  "CHANGE"  must 
be  the  first  five  characters  on  the  input  line,  and  blanks  must  appear  exactly  as 
shown. 
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"XXXX"        is  the  address  of  the  first  word  to  be  changed. 

"Y"  is  the  number  of  consecutive  words  to  be  changed 

(0-9). 

"1111 ZZ"   is  the  hexadecimal  character  string  to  be  loaded  at 

"XXXX",  four  characters  per  16-bit  word. 
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APPENDIX 
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APPENDIX  A 
Flow  Charts 

This  section  contains  detailed  flow  charts  for  the  sixteen  processing 
routines  that  make  up  the  retrieval  control  program.  The  first  diagram  shows 
the  overall  structure  of  the  program  and  the  interrelationships  which  exist 
among  the  parts.  When  one  routine  is  shown  below  another  and  connected  to  it 
by  a  vertical  line,  the  first  procedure  is  called  by  the  second.  The  remainder 
of  the  section  contains  standard  flow  diagrams. 
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YES 


MARKERR 


MARK  ERROR 
POSITION  ON 
INPUT  LIME 


S'|D55 


WRITE  ERROR 
MESSAGE 


(   supervisor) 


SUP01 


/  WRITE  'READY'  / 


READ  INSTRUCTION 
FROM  TTY    / 


INITIALIZE 
REGISTERS  FOR 

SUPERVISOR, 
FIND,  &  SHALG 
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LOCATE  FIRST 
NON-BLANK 
CHARACTER 
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PRINT 


SFNDOl 
FIND 


SDMPOl 
DUMP 


NO 
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TSRCH 


IDENTIFY 

INSTRUCTION 

KEYWORD 
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CHANGE 
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INVALID 
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TEMPORARY 

DEBUGGING  AIDS 
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(SUP50) 


"THE  I.ABEL  "SPNTOl"  IS  AVAILABLE  FOR  FUTURE  IMPLEMENTATION  OF  AN  INDEPENDENT  "PRINT"  INSTRUCTION. 
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)'  MESSAGE 


FIND60 


WRITE  'STRTXT 

TABLE  FULL 

MESSAGE 


SUPA 
(SUPOl) 


COMPLETE 
STRTXT  TABLE 
ENTRY 


© 


(FIND03! 
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(cntxtJ 


CNTXT 


MOVE  REQUESTED 
CONTEXT  START  & 
END  CHARACTERS 
TO  "FIND"  CTXT 
REGISTERS 


(SUPOl) 


f        SHALG   J 
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SHALG02 


INITIALIZE 

POINTERS  AND 

COUNTERS  FOR 

PROCESSING  NEXT 

'CHAIN'  OF  TAGS 


SHALG03 


COUNT  ON  NEXT  'CHAIN': 

A.  NO.  OF  UNIQUE  TAGS 

B.  NO.  OF  TAGS  WHICH  REPRE- 

SENT STRINGS  ONLY 


SHALGIO 


LIST  IN  STACK  ALL  UNIQUE  TAGS 
TOGETHER  WITH  NUMBER  OF  STRINGS 
AND  NUMBER  OF  PARENTHESIZED 
QUANTITIES  REPRESENTED  BY  EACH. 
LIST  AS  TWO  SEPARATE  GROUPS  WITH 
'STRINGS  ONLY1  TAGS  FIRST 


SHALG24G) 
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0 


SHALG24G 


SORT 


SORT  TAGS  REPRESENTING  BOTH  STRINGS 

AND  PARENTHESIZED  QUANTITIES 

(GROUP  II  TAGS)*  BY  NUMBER  OF 

STRINGS  (FEWEST  TO  MOST) 


SHALG25A 


PREPARE  TO  SORT 
GROUP  II  TAGS  BY 
NUMBER  OF  PAREN- 
THESIZED QUANTITIES 


SHALG26 


YES 


SHALG29 


CHANGE  "SORT 
KEY  POINTERS" 


YES 


;SHALG32) 


SORT 


SORT  GROUP  II  TAGS  BY 
NUMBER  OF  PARENTHESIZED 
.QUANTITIES  (FEWEST  TO  MOST)/ 


SHALG30 


RESTORE  "SORT 
KEY  POINTERS" 


NOTE  DEFINITION  OF  "GROUP  II  TAGS" 
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SHALG32 


PREPARE  TO  SORT 

ENTITIES  WITH 

IDENTICAL  TAGS 


YES 


SHALG36 


YES 


NO 


SHALG90 


WRITE  ERROR 
MESSAGE- 
STACK  OVERFLOW 


SHALG37 


ENTER  IN  STACK  1 

•TAG  TABLE  PTR 
•LENGTH  OF  STRING 
•SORT  KEY  PTR 


ENTER  IN  STACK  2 

•TAG  TABLE  PTR 
•NO.  OF  STRINGS 

ENCLOSED 
•SORT  KEY  PTR 


f     RETURN-  ^ 
V   ERROR  J 


:SUP01 


NO 


YES 


AT  THIS  POINT,  TAGS  ON  THE  CHAIN  CURRENTLY  BEING  PROCESSED  HAVE  BEEN  SORTED 
INTO  THE  FOLLOWING  ORDER: 

A.  TAGS  WHICH  REPRESENT  A  SINGLE  STRING,  ARRANGED  FROM  SHORTEST  STRING 
TO  LONGEST 

B.  TAGS  WHICH  REPRESENT  SEVERAL  STRINGS  (IN  CONJUNCTION)  BUT  NO  PARENTHE- 
SIZED EXPRESSIONS,  ARRANGED  FROM  FEWEST  STRINGS  TO  MOST 

C.  TAGS  WHICH  REPRESENT  PARENTHESIZED  QUANTITIES  AND  (POSSIBLY)  STRINGS, 
ARRANGED  FIRST  FROM  FEWEST  STRINGS  TO  MOST  AND  THEN  FROM  FEWEST  PAREN- 
THESIZED EXPRESSIONS  TO  MOST 


CODE  WHICH  FOLLOWS  WILL,  FOR  EACH  TAG,  ARRANGE  STRINGS  FROM  LONGEST  TO  SHORTEST 
AND  PARENTHESIZED  EXPRESSIONS  FROM  FEWEST  ENCLOSED  STRINGS  TO  MOST 
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SORT 


STACK  1. 
(SORT  RETURNS 
SHORT  TO  LONG). 


SHALG39 


REVERSE  ORDER 
OF  SORT- 
SORT  FROM 

LONG 
TO  SHORT 


I 


SHALG40 


SORT 
STACK  2. 

FROM  FEWEST 
STRINGS  TO  MOST 


I 


SHALG41 


REPLACE  OLD 
CHAIN  WITH 
NEW  ONE  LINKING 

TAGS  IN 
SORTED  ORDER 


NO 


PREPARE  TO 
PROCESS  NEXT 
CHAIN 
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AH  1  *** 
SHALG43 


PREPARE  FOR 

SUCCESS/FAILURE 

ASSIGNMENTS 


SHALG44 


AL 


NO 


SHALG45 


PREPARE  TO 

PROCESS  NEXT 

ENTITY 


YES 


YES 


SHALG59 


NO 


RETURN 


SHALG52  v 


BEGIN  STACK 

ENTRY  FOR 

CURRENT  ENTITY 


SHALG46 

(CALL  SUBR.  LWTAG) 


AM 


SHALG60 


POP  STACK;, 
RETURN  TO  NEXT 
LOWER  LEVEL 


SHALG50 


SUCCESS  POINTER: 

NEXT  ENTITY 

WITH  CURRENT 

TAG 


NO 


SUCCESS  POINTER: 

SEARCH  SUCCESS 

SYMBOL 


SHALG51 


SUCCESS  POINTER: 

SAME  AS  NEXT 

LOWER  LEVEL 

(GET  FROM  STACK) 


v     SHALG47 


SHALG47 


NO 


CALL  SUBR.  LONCHN) 
YES 


FAILURE  POINTER: 
FIRST  ENTITY  IN 
CURRENT  CHAIN 
HAVING  A  DIFFER- 
ENT TAG 


FAILURE  POINTER: 

SEARCH  FAILURE 

SYMBOL 


SHALG49 


FAILURE  POINTER: 

SAME  AS  NEXT 

LOWER  LEVEL 

(GET  FROM  STACK) 


"MAKE  SUCCESS/FAILURE  ASSIGNMENTS. 
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LWTAG) 


SHALG53 


SHALG54 
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c 


SORT 


") 


SORTOl 


YES 


OLD  LOW  PTR  = 
LOW  PTR 

INDEX  PTR  = 
LOW  PTR  +  STEP 


c 


RETURN 


J 


S0RT02 


YES 


YES 


OLD  LOW  PTR 
INDEX  PTR 


S0RT04 


INTERCHANGE 
LOW  PTR  AND 
INDEX  PTR 

LOW  PTR  = 
LOW  PTR 
+  STEP 


SORT03 


INDEX  PTR  = 

INDEX  PTR  +  STEP 


NOTES:  INPUTS:  LOW  PTR  =  ADDRESS  OF  FIRST  ELEMENT  IN  SORT  LIST 
HIGH  PTR  =  ADDRESS  OF  LAST  ELEMENT  IN  SORT  LIST 
STEP  =  ADDRESS  INCREMENT  BETWEEN  ELEMENTS 

EACH  "ELEMENT"  IS  A  POINTER  TO  SORT  KEY.  ELEMENTS,  BUT  NOT 
SORT  RECORDS,  GET  PHYSICALLY  REARRANGED  IN  MEMORY. 


OUTPUT:  LIST  OF  ELEMENTS  ARRANGED  IN  ORDER  OF  INCREASING  SORT  KEY  VALUES 
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RETURN 

THROUGH  IAR6 

(CURRENT  ENTITY 

IS_  LAST  WITH 

CURRENT  TAG) 


RETURN 

THROUGH  IAR7 

(CURRENT  ENTITY 

IS  NOT  LAST  WITH 

"CURRENT  TAG) 
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f      LONCHN  J 


I 


LONCHNOl 


TEST  LINK  = 
POINTER  TO 
NEXT  ITEM  ON 
CHAIN 

TEST  TAG  = 
TAG  OF  NEXT 
ITEM  ON  CHAIN 


RETURN—THROUGH  IAR6 

(CURRENT  TAG  J_S 

LAST  UNIQUE  TAG 

ON  CHAIN) 


RETURN—THROUGH  IAR7 

(CURRENT  TAG  IS 

NOT  LAST  UNIQUT 

TAG  ON  CHAIN) 


ADVANCE  TEST 

LINK  AND  TEST 

TAG  TO  NEXT 

ITEM 


/   SEARCH   \ 
V  MONITOR  J 
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SRCHOl 


INITIALIZE  FLAGS: 

0  -  14:0 

15:1  (SEARCH 

SUCCESS) 


ERROR 


SUP55: 

(ILLEGAL 
KEYWORD) 


SET  FLAG 
(IN  CTR11) 


SRCH04 


LI  DELIMITERS  ARE 
PARAGRAPH 


"  SRCH06 


YES/FI_ND  C0NTEXT\      NO 
SENTENCE 


SET  FLAG 


(SRCH07)     (    QI 
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QI 


SRCH07 


INITIALIZATION 
FOR  SRCH  &  PRNT 


SRCH08 


YES 


LAST 
DOCUMENT  \  NO 
PROCESSED 


(   RETURN  J 


RESET  ALL  'FIND' 

AND  SEARCH 

CONTROL  FLAGS 


CALCULATE  DISK 
ADDR.  OF  NEXT 
DOC.  AND  SET 

PARMS  FOR  NBUF03  CALL 


NBUF03 


INPUT  FIRST 

SECTION  (MEM. 

LOAD)  OF  NEXT 

DOCUMENT 


NBUFOl 


/  GET  REQUIRED 

(SECTOR  IN  CORE 

\  AND  SET  PTRS. 

\  AS  NEEDED 


SRCH11     , 

i 

SEARCHF-- 

FAIL       / 

FIND  LEVEL 
CHAR/ 

.  0  START 
\CTER 

(SYSTEM 
ERROR) 


SUCCEED 


END-OF-SEARCH  ■ 
END-OF-LEVEL  1 


CTR2  =  NO.  OF  DISK  SECTORS  YET  TO  BE 
PROCESSED  IN  LEVEL  0 
(USED  TO  DETERMINE  WHEN  END  OF 
LEVEL  0  IS  IN  CORE) 


YES 


SET  FLAG- 
SEARCH  FOR  TERMS 
IN  ORIGINAL  ORDER 


NO 


SRCH13 


SRCH20 


SEARCH  (LEVEL  1) 

FULL  SEARCH  ON 

CONTEXT  SPECIFIED 


(STB) 


(STA) 


SEARCH  (LEVEL  1) 
PARTIAL  SEARCH- 
SET  FLAGS  ON 
SUCCESS  &  CONTINUE 


88 


SRCH13 


DETERMINE  END-OF-SEARCH  CHARACTER: 

END-OF-LEVEL  0  IF  IN  CORE, 

OTHERWISE  END-OF-BUFFER 


SRCH14 


HUNTOI 

SEARCH  FULL 

BUFFER  (LEVEL  1)  FOR 

ALL  STRINGS  .IN 

ORIGINAL  ORDER 

AND  RETURN 


SRCH15 


YES 


YES 


PTR1  =  0 


NO 


NO 


RESET  DOCUMENT 

SUCCESS  FLAG  AND 

'HIT'  FLAGS  IN  'TAGS'. 

SET  SEARCH 

SUCCESS  FLAG 


SRCH16 


CTR2  =  CTR2  -  CTR1 


YES 


/    PRINT   \ 


ASK 
l'ES'FOR  CONTINUE^  N0 
INSTRUCTION/ 


SRCH80 


CNTRLOl  = 
CNTRLOl  +  CTR1 


(ABANDON 
SEARCH) 


(SRCH08) 

(GET  NEW 
DOCUMENT) 


YES 

^CTR2  <  3  j> 

NO 

1 

' 

' 

,  SRCH17 

CTR1  = 

:tr2  +  i 

CTR1  =  4 

' 

'  SRCH 

8 

NBUFQ3 

GET  NEXT 
BUFFER 


STB 


SRCH20 


SRCH20A 


SH 


YES 


SET  FLAG  FOR 
"RETURN  ON 
FIRST  HIT" 


YES 


YES 


END  OF  SEARCH 

CHARACTER  = 

END  OF  LEVEL  0 


NO 


SRCH22 


SRCH21 


SEARCHF-- 

FIND  START  AND 

END  OF  FIRST  (NEXT) 

LEVEL  I 

SEARCH  CONTEXT 

[LEAVE  ADDRESSES 

IN  'SRCHSTRT' 
AND  'ADENDL1 '] 

SAVE  END-OF-LEVEL  1 
SEARCH  CHARACTER 


SEE 

NOTE 

*2 


SUCCEED 


(SYSTEM 
ERROR) 


SEE 

NOTE 
*3 


SEE 

NOTE 
*4 


HUNT  —  SEARCH  LEVEL  1 

CONTEXT  FOR  ALL  STRINGS  IN 

REVISED"  ORDER.   EXCEPTION:  CASE  III 

--RETURN  AFTER  FIRST  HIT. 


SD 


YES 


NO 


(SRCH29) 


NOTES: 


1.  CASE  III,  LEVEL  1  AND  CASE  IV,  LEVELS  1  AND  2  REQUIRE  BUFFER  TO  END  ON  PARAGRAPH  OR 
SENTENCE  DELIMITERS.  ALWAYS  END  TEXT  BUFFER  WITH  :88EEEC:,  I.E.,  DEFINE  END  OF 
BUFFER  TO  BE  BOTH  END  OF  PARAGRAPH  AND  END  OF  SENTENCE. 

"2.  END  OF  SEARCH  =  END  OF  BUFFER  (ALREADY  SET). 

"3.  END  OF  SEARCH  =  END  OF  LEVEL  1  CONTEXT  =  PARAGRAPH  DELIMITER  (ALREADY  SET). 

"4.  END  OF  SEARCH  =  END  OF  LEVEL  1  CONTEXT  =  END  OF  FIND  CONTEXT  (ALREADY  SET). 
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SRCH23 

SAVE  'SRCHSTRT' 

AND  TAG  TABLE 

POINTER  FOR 

LEVEL  1  SEARCH 


SEARCHR--TO 

START  OF  LEVEL  2 

CONTEXT. 

(LEAVE  ADDR. 

IN  'SRCHSTRT') 


SRCH24 


RESET  FLAGS  FOR  'DOCUMENT  SUCCESS' 

AND  'FIRST  HIT  RETURN' 

SAVE  OLD  'END  OF  SEARCH'  CHARACTER 

NEW  'END  OF  SEARCH'  CHARACTER  = 

END  OF  LEVEL  2  CONTEXT 

INITIALIZE  TAG  TABLE  POINTER 


SRCH25 


€) 


HUNT--SEARCH  LEVEL  2 

FOR  ALL  STRINGS  IN 

"REVISED"  ORDER 
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SB 


sr 


SRCH26 


YES 


RESET  'DOCUMENT 
SUCCESS'  FLAG 

SET  'SEARCH 
SUCCESS'  FLAG 


1 
^/SUCCESS\_ 


NO 


PTR2  =  ADDR.  OF 
START  OF  NEXT 
LEVEL  2  CONTEXT 


(ABANDON 
SEARCH) 


(SRCHOR) 

(GET  NEW 

DOCUMENT) 


SET  PTR2  TO 

SECOND  CHARACTER 

AFTER  END  OF 

LAST  CONTEXT 

PRINTED 


RESTORE  PARAMETERS 

TO  RESUME  LEVEL  1 

SEARCH  (EXACTLY 

WHERE  IT  STOPPED) 


SET  HUNT  RETURN 
ADDRESS  AS  FOR 
CALL  FROM  (stj) 

(SET  'NON-STANDARD' 
RETURN) 


HUNT03A 


WANT  SAME  ACTION 
AND  SAME  RETURN 
AS  FOR  ORIGINAL 
LEVEL  1  SEARCH  CALL/ 


VAT  (SD 


n   SRCH27 


PTR2  >_  'ADENDLTs 

lHAS  END  OF  LEVEL  > 

CONTEXT  BEEN 

REACHED) 


SRCH28 


(SRCH25) 


(SRCH31) 
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SRCH29 


YES 


CTR2  =  CTR2  -  CTR1 


'SRCHSTRT' 


'ADENDL1  ' 


CNTRL01  = 
CNTRLOl  +  CTR1 


YES 


CTR1  =  CTR2  +  1 


SEARCHF--TO 
END  OF  LEVEL  0 
OR  LEVEL  1  CON- 
TEXT, WHICHEVER 
COMES  FIRST 
(SAVE  ADDR.  OF  END 
OF  LEVEL  1) 


SRCH30C 


CTR1  =  4 


w  SRCH30F 


NBUF03 

GET  NEXT 
BUFFER 


(SRCH20A) 


PRINT 
"^ROUTINE  CALLEDV  NO 
~\THIS  DOCUMENT. 


SET  FLAG  FOR 

LAST  CALL  TO 

PRINT 


(print) 


0 


(SRCH22) 


0 


(SRCH08) 
(GET  NEW  DOCUMENT) 
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[DESIRED 
SECTOR 
IS  IN  CORE! 


(NBUF05) 


SET  PTR6  (INDIRECT  -  2  LEVELS) 

TO  S-MEMORY  START  ADDRESS 

OF  DESIRED  SECTOR.  RESULT 

IS  MEANINGLESS  IF  DESIRED 

SECTOR  IS  NOT  IN  CORE 


Fl  =  DISK  SECTOR  ADDRESS  OF 
START  OF  CURRENT  BUFFER- 
DISK  SECTOR  ADDRESS 
OF  START  OF  DESIRED 
SECTOR 


CNTRL01  = 

CNTRLOI  +  Fl  = 

DISK  ADDR.  OF 

FIRST  SECTOR 

TO  BE  READ 

NO  CHANGE  NEEDED 

IN  CNTRLQ2 


I  1 


:  CNTRL19  - 
CNTRLOI 


YES 


NO 


CTR1  =  Fl  +  1 


HBUF02 


CTR1  =  4 


I 


(NBUF05) 


SET  PARAMETERS 

FOR  DISK  READ 

BEGINNING  WITH 

DESIRED  SECTOR 


(NBUF03) 


NBUFOII 


NA 


(NBUF03) 


►CALLING  CONVENTIONS: 

CTR1  =  N  ■  NUMBER  OF  SECTORS  TO  BE  READ 

CNTRLOI  ■  DISK  ADDRESS  OF  FIPST  SECTOR  TO  BE  READ 

CNTRL02  ■  S-MEMORY  ADDRESS  FOR  FIRST  CHARACTER  READ 
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NBUF03 


READ  DISK 
CTR1  SECTORS 
FROM  *CNTRL01 
INTO  *CNTRL02 


u  NBUF0.4 


PTR6  =  CNTRL02 


NB 


NBUF05 


1 


PTR6  =  *PTR6  = 
S-MEM.  ADDR.  OF 
START  OF  FIRST 
SECTOR  IN  CORE 


INITIALIZE  TAG 
TABLE  POINTER 


RETURN 
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(         HUNT   J 


HUNT01 


PTR2  =  'SRCHSTRT' 


HUNTOIA 


YES 


HUNT02 


SAVE  TAG 
TABLE  POINTER 


NEXTMOl 


SET  PTR1  TO  START  OF 

NEXT  SEARCH  STRING,  USING 

'REVISED'  INPUT  ORDER 


NEXTLOl 


SET  PTR1  TO  START  OF 
NEXT  SEARCH  STRING, 
USING  'ORIGINAL'  INPUT  ORDER, 


RESET  'HIT'  FLAG 

AT  PREVIOUS  TAG 

TABLE  ENTRY 


YES 


(       RETURN  J 


SET  'DOCUMENT 
SUCCESS'  FLAG 


HUNT04 


SET  'FIND'FLAG 
(IN  TAG  TABLE) 
FOR  CURRENT  STRING 


RETURN 


(  hunt\ 
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(    NEXTL   J 


NEXTL01 


SET  TAG  TABLE 
POINTER  TO 
NEXT  TAG 


NO 


YES 


NEXTL02 


SET  PTR1  TO 

STRING  ADDRESS 

FOR  NEXT  TAG 


PTR1  =  0 


YES 


.  SET  PTR1  TO 
START  ADDRESS 
FOR  NEXT  STRING 


RETURN 
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NEXTM02 


YES 


SET  TAG  TABLE 
POINTER  TO 
SUCCED  LINK  FOR 
PREVIOUS  STRING 


SET  TAG  TABLE 

POINTER  TO 

FAIL  LINK  FOR 

PREVIOUS  STRING 


YES 


SET  'DOCUMENT 
SUCCESS'  FLAG 


NEXTM03 


NEXTM05 


SET  TAG  TABLE 

POINTER  TO 

NEXT  TAG  (USING 

SUCCED  OR  FAIL  LINK) 


PTR1  =  0 
(INDICATES  SEARCH 
COMPLETION- 
SUCCESS  OR  FAILURE) 


SET  PTR1  TO 

STRING  ADDRESS 

FOR  NEXT  TAG 


f       RETURN  J 


YES 


NEXTN 

'TAG  REPRESENT^ 

PARENTHESIZED 

.EXPRESSION, 


NO 


SET  PTR1  TO 

START  ADDRESS 

FOR  NEXT  STRING 


f      RETURN   J 


(       PRINT   'N 
V  MONITOR  J 
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SAVf:  ALL  DATA 

NEEDED  TO 

RESTORE  BUFFER 


INITIALIZATION 


SET  'FIRST  ENTRY 

FLAG 

PREPARE  OTHER 

FLAGS 


PRNT06 


SEARCH  PRINT 

TABLE  FOR  NEXT 

PRINT  CONTEXT 


PRNTOl 


RESET  'OUT- 
DENT'  FLAG 


YrS 


pp.nto; 


MOVE  PRINT 
rO.'.TCXT  IDENT- 
IFIERS TO  PRINT 
CONTEXT  REGISTERS 


r-;ESET  FLAG 
FOR  "FIND 
CONTEXT  LARGER 
FHAN  PRINT  CONTEXT' 


P2 


NO 


PRNT05 


YES 


PM 
(PRNT1! 


PA 


0 
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PRNT07C 


YES 


YES 


YES 


CTXTEND  f\  YES 
PRINT  CONTEXT 
START 


PRNT08 


BOTH 
PARAGRAPH  \  NO 
AND  SENTENCE 
.REQUESTED. 


SELECT  PARAGRAPH 
REJECT  SENTENCE 


CTXTEND  = 
PRINT  CONTEXT  END 


PRNT07F 


SET  FLAG-- 

PRINT  CONTEXT 

IS  MAJOR  SECTION 

OR  DATA  SUBSECTION 


MOVE  PRINT 
CONTEXT  DELIM- 
ITERS TO  PRINT 
CONTEXT  REGISTERS 


PRNT09 


o 


NBUF01- 


READ  REQUIRED 

SECTION,  IF  NOT, 

IN  CORE 


PRNT11 


SEARCHF--FIND 

START  OF  PRNT- 

SRCH  SECTION 


SUCCEED 


© 

(PRNT13) 


FAIL  /SRCH1 
90 


(SYSTEM 
ERROR) 


PRNT08F 


RESET  FLAG- 
PRINT  CONTEXT 
IS  MINOR  SECTION 


^CTXTEND  <  :C9:— 


© 


PA 

(PRNT06) 
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PD 


PRNT13 


YES 


YES 


PRNT15 


YES 


SEARCH   END 

CHARACTER  = 

iP-OF-PARAGRAPH 


PRNT16 


SEARCH  END 

CHARACTER  = 

END-OF-BUFFER 


PRNT14 


SEARCH  END 

CHARACTER  = 

END-OF-PRINT- 

SEARCH  CONTEXT 


YES 


SEARCHF— 

LOCATE 

START  OF 

FIRST  PARAGRAPH 


FAIL 


SUCCEED 


PRNT15A 


NO 


YES 


(SYSTEM 
ERROR) 


SEARCHF--TO 
END  OF  PRINT-SEARCH 
CONTEXT.  LEAVE 
ADDR.  IN  PTR14 


(PRNT19) 
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P3 


YES 


PRNT17 


SET  FLAG- 
FIND  CONTEXT  IS 
LARGER  THAN 
PRINT  CONTEXT 


YES 


PRNT-SRCH 

CONTEXT  = 

FIND  CONTEXT 


RESTORE 
ORIGINAL  BUFFER 


PRNT17C 


YES 

^-^  FIND  START  \ 
-.   <  CTXTEND 

^  NO 

' 

< 

PRNT-SRCH 
START  =  CTXTEND 

PRNT08I 


YES 


PRNT17L 


PRNT-SRCH  START 
=  ABSTRACT  START 


SEARCHF  FIND  END 


OF  ABSTRACT 

AND  SAVE 

ADDRESS 


YES 


CTXTEND  = 

PRNT-SRCH 

CONTEXT  END 


(PRNT09) 


NBUF03 

READ  START  OF 

BODY  FROM 

DISK 


YES 


PRNT17P 


SET  SEARCH 

START  POINTER 

AND  ADENDL1  AT 

START  OF  BODY 


SET 


"NO  PRINT" 
FLAG 


] 

P4 


*NOTE:   IF  HIT  IS  IN  ABSTRACT  AND  ABSTRACT  HAS  ALREADY  BEEN  PRINTED, 
REJECT  HIT  AND  RESUME  SEARCH  AT  START  OF  BODY. 
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PRNT17S 


CTXTEND  = 
LEVEL  0  END 


YES 


PRNT18A 


PRNT-SRCH  CTXT 
=  PARAGRAPH 


PRNT-SRCH  CTXT 
=  PRINT  CTXT 


SET  FLAG-- 

FIND  CONTEXT 

IS  LARGER  THAN 

PRINT  CONTEXT 


SET  SRCHSTRT; 

MOVE  PRNT-SRCH 

CTXT  END  CHAR. 

TO  SEARCH  END 

REGISTER 


PRNT19 


SET  'FIRST  HIT', 

'ORIGINAL  ORDER' 

FLAGS 


PRNT20 


SEARCH 

FOR 
STRING 


YES 


SEARCHR--TO-  1ST  BLANK, 
OUTDENT  CHAR.  OR  END 
OF  PRINT-SRCH  CTXT.  CHAR. 
BEFORE  END  OF  STRING 


FLAG  14 


NO 


PRNT21 


SEARCHF--FIND  END 
PF  PRNT-SRCH  CTXT 
AND  SAVE  ADDRESS 


MVF— INSERT 

'OUTDENT'  CHAR. 

AT  BLANK 


RESET  'ORIGINAL 

ORDER',  'FIRST 

HIT'  FLAGS 


SET  'OUTDENT'  FLAG 

RESET  'DOCUMENT 

SUCCESS'  FLAG 


CONTINUE 

SEARCH  FOR 

STRING 
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NO 


PO 
(PRNT24) 


0 

(PRNT22) 
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YES 


c 


RETURN 


YES 


PRNT25 


PREPARE  FOR 
CALL  TO  NBUF 


NBUF01I 


(GET  NEXT 
BUFFER) 


PD 

(PRNT13) 


PRNT24 


YES 


YES 


(PRNTU6) 
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fBLKP  (BLOCK  *\ 
VAND  PRINT)  J 


BLKP01 


ADJUST  LINE  START 
AND  END  POINTERS 


L-END  =  L-START  + 
L-LENGTH  -  5 


NO  . 

^^L-END  <\. 
TND  OF  CURRENT 
\   SEARCH   > 
\UNIT^/ 

.   YES 

iO 

r 

YES 

,r  BLKP02 

' 

SEARCHR--FIND 

FIRST  BLANK 

BEFORE  L-END- 

LEAVE  ADDR.  IN 

L-END 

L-END  =  END  OF 
SEARCH  UNIT 

_^L-START\^  ' 

\<  L-END/^" 

1' 

L 

+ 

-END  =  L-START 
L-LENGTH  -  5 

' 

f  blkpo: 

1 

SAVE  ORIGINAL 

CHARACTER  AT 

END  OF  PRINT 

LINE 

MOVEf_-- INSERT  "END  OF 

PRINT  LINE"  CHARACTER 

AT  L-END 


L-START  =  L-START 


NO 


SAVE  ORIGINAL  5 

CHARACTERS  AT  START 

OF  PRINT  LINE 


HOVEL- -MOVE 
— '  TO  L-START 


YES 


BLKP04 


SAVE  ORIGINAL  5 

CHARACTERS  AT 

START  OF  PRINT  LINE 


MOVEF--MOVE 
-^TO  L-START 


L 


PRINT  LINE 


7 


RESTORE  ORIGINAL  5  CHARACTERS  AT 
START  AND  1  CHARACTER  AT  END  OF  PRINT  LINE 


L-START  =  L-END 
+  1 


■3  .<  END  OF  CURRENT'  ' 


SEARCH 
JJNIT 


"(      RETURN  J 
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APPENDIX  B 

Register  and  Flag  Assignments 

A.  Assignments  for  SUPERVISOR,  FIND  Decode,  PRINT  Decode,  SHALG,  and  associated 
procedures.  Unlisted  registers  are  not  used. 

IAR  Registers 

IAR6      Return  address  from  MARKERR;  alternate  return  address  from  LWTAG 
and  LONCHN 

IAR7      Return  address  from  SORT,  LWTAG,  LONCHN 

IAR8      Return  addresses  from  major  routines  (FIND,  SHALG,  SRCH,  etc.) 

IAR10     =SUP50-1:  Transfer  address  for  printing  "Invalid  character  or 

Keyword"  error  message 

IAR11      "FAILURE"  return  address  for  TSRCH  (searches  in  RESKEY  and  CTXT 
Tables) 

"SUCCESS"  return  address  for  TSRCH 

=PRINT02-1,  =PRINT05-1  or  =FIND03-1:  New  character  processing  in 

decode  routines 

Error  message  address  for  "missing  (')"  exit  from  SEARCHF  instruc- 
tion in  LTSTRING  processing 

=LTSTR02-1 :  successful  comparison  exit  from  COMPARE  instruction  in 
LTSTRING  processing 


In  FIND:  Temporary  "new"  character  address  in  LEGALTAB 

In  SHALG:  Tag  Table  address  of  first  entity  with  current  tag 

PTR1      In  FIND  and  PRINT  Decode:  Byte  address  of  current  input  character 

In  SDMP,  SCHNG,  and  PACK:  Word  address  of  next  word  to  be  processed 

In  SHALG:  "Chain  locater"  --  Normally  points  to  second  column  of 
Tag  Table  entry  for  parenthesized  expression  at  start  of 
current  chain 

In  LWTAG,  LONCHN:  Temporary  pointer 


IAR12 

IAR13 

IAR14 

IAR15 

PTR 

Registers 

PTRO 
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PTR  Registers 

PTR2      Points  to  Hex  :81:  in  CHR  14.  (Used  to  move  Hex  :81:  ahead  of 
alphanumeric  keystrings  for  table  searching) 

PTR3      Points  to  Hex  :82:  in  CHR13.   (Used  in  TSRCH  to  check  for  :82: 
at  end  of  data  string) 

PTR4      Points  to  Hex  :20:   (Blank)  in  CHR11 

PTR5      In  MARKERR:  Data  pointer  for  error  message  preparation 
In  SHALG:  Utility  pointer  into  Tag  Table 

PTR6      In  PRINT  Decode:  Local  functions  in  initial  sections 
In  SHALG:  Utility;  special  stack  pointer 
In  SHALG,  LWTAG,  LONCHN:  Pointer  to  current  tag 

PTR7      In  SHALG:  Utility  pointer  in  Tag  Table  and  Stack 

PTR8      In  PRINT  Decode:  CTXT  Table  pointer  for  "Clear  Context  Table" 

procedure 

In  SHALG:  Next  location  to  receive  a  link  in  constructing  re- 
ordered chain  and  in  SUCCESS/FAILURE  assignments 

PTR9      Points  to  Hex  :80:,  the  End-of-Table  symbol,  in  CHR15 

PTR10     Points  to  Hex  :27:   (Apostrophe)  in  CHR2 

In  SHALG:  Used  in  stacking  individual  entities 

PTR11      In  SHALG:  Utility;  special  stack  pointer 

PTR12     In  FIND:  Standard  Data  Pointer  for  string  instructions  (Key 

Pointer  in  "skip  blanks"  instructions) 

In  SHALG:  Temporary  stack  pointer 

PTR13     Start  of  character  table,  "CHARS" 

PTR14     Current  legal  character  address  in  LEGALTAB 

PTR15     In  TSRCH:  Temporary  storage  for  contents  of  PTR1 

In  SDMP,  SCHNG  and  PACK:  Points  to  DUMP  instruction  under  con- 
struction or  to  S-Memory  words  being 
changed 
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CHR 

Regis 

ters 

CHRO 

CHR1 

CHR2 

CHR12 

Hex  :000D 

CHR13 

Hex  :0082 

CHR14 

Hex  :0081 

CHR15 

Hex  :0080 

CTR  Registers 

CTRO 

Character 

CTR1 

In  FIND: 

"FIND"  context  start  character 

"FIND"  context  end  character 

In  FIND  Decode:  Hex  :0027:   (Apostrophe) 

In  PRINT  Decode:  Hex  :002C:   (Comma) 

Hex  :005E:   (+  for  error  messages  in  MARKERR) 

Hex  :0020:   (Blank) 

(Carriage  Return) 

Alphanumeric  string  suffix  in  keyword-type  tables 

Alphanumeric  string  prefix  in  keyword-type  tables 
(including  teletype  input  line,  TTYIN) 

End-of -table  symbol 


Character  identification  in  PRINT  Decode  and  FIND  Decode 

In  FIND:  Total  character  count  for  requested  search  term 

In  SHALG:  Counter—total  number  of  different  tages  on  current  chain 

CTR2       In  MARKERR:  Character  count  for  error  message  preparation 

In  SHALG:  Counter—number  of  parenthesized  expressions  with 
current  tag 

CTR3       In  SHALG:  Counter—number  of  tags  on  current  chain  which  represent 

strings  only 

CTR4       In  SHALG:  Counter— number  of  strings  associated  with  current  tag 

CTR5       In  SHALG:  Counter— number  of  strings  inside  a  "new"  parenthesized 

expression 

CTR6       In  SHALG:  Temporary  storage  to  assist  in  setting  up  stack  pointers; 

total  number  of  entities  on  current  chain 

CTR7       Hex  :008D:  (Used  in  CTXT  Table  to  identify  contexts  selected  for 

printing) 

CTR9       In  SDMP,  SCHNG,  and  PACK:  Counter  for  number  of  passes  through 

PACK  procedure 
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CTR  Registers 

CTR9       In  SHALG:  total  number  of  entities  on  current  chain—used  as 

counter  in  constructing  "sorted"  chain  in  Tag  Table 

CTR15      Hex  :0000: 

FLAGS  (Low  order  bit  is  numbered  15) 

BIT15      Search  Success:  On  return  from  search,  1  implies  NOT  FOUND, 

0  implies  at  least  one  responding  document 

In  SHALG:  indicator  for  "local"  conditions 
B.  Assignments  for  SRCH,  PRNT,  and  associated  procedures. 


IAR  Registers 

IARO 

IAR1 

IAR3 

IAR4 

IAR5 

IAR6 

IAR7 

IAR8 

IAR9 

IAR11 

IAR13 

IAR14 

IAR15 

In  SRCH:  Used  as  return  address  set  by  DECRV  instruction 

=SRCH80-1:  Transfer  address  for  "CONTINUE?"  message 

Return  address  from  BLKP  (Block  &  Print) 

Return  address  from  NEXTM 

Return  address  from  NEXTL 

Return  address  from  HUNT,  NBUF 

Return  address  from  PRNT 

Return  address  to  SUPERVISOR 

=SRCH32-1:   "FIND  Level-0  end  character"  exit  from  SEARCHF 
instruction  at  SRCH31A 

"FAIL"  transfer  address  for  FIND  instruction  at  HUNT03A 

=PRIMT23B:   "No  more  outdent  marks"  exit  for  SEARCHF  instruction 
at  PRNT22 

=SRCH90-1:  transfer  address  for  "SYSERR"  message 

In  PRNT:  Temporary  storage  for  PTR2  during  some  calls  to  BLKP 


no 


PTR  Registers 

PTRO       In  SRCH:   "ADENDL1"  (Address  of  the  End  of  the  Level  1  search 

context) 

PTR1       In  BLKP:  Start  of  print  buffer 

In  SRCH:  Text  of  string  currently  sought;  temporary  storage 

In  NEXTL,  NEXTM:  Return  argument--PTRl=  pointer  to  next  string; 
if  none,  then  PTR1=0 

PTR2       In  BLKP:  End  of  print  context 

In  SRCH:  Search  start  pointer  (Different  from  search  start 
variable,  SRCHSTRT,  in  PTR6) 

PTR3       In  SRCH,  HUNT,  NEXTL,  NEXTM:  Current  Tag  Table  entry 

PTR4       In  BLKP  and  PRNT:  Temporary  storage  (the  two  uses  are  independent 

of  one  another) 

PTR5       In  PRNT:  Points  to  «CTR5»  (Hex  :84: ,  outdent  symbol) 

PTR6       SRCHSTRT 

PTR7       In  PRNT  and  BLKP:  Points  to  «CTR4»  (Hex  :89:,  end  of  print 

buffer  symbol 

PTR8       In  PRNT:  Print  context  table  pointer 

PTR9       Points  to  :80:   (End-of-table  symbol)  in  CHR15 

PTR11      In  HUNT:  Temporary  storage  for  PTR3  during  calls  to  NEXTM 

In  PRNT:  Temporary;  used  to  transfer  context  delimiters  from 
CTXT  Table  to  CHR  registers;  search  pointer  for 
"Insert  'MARK  LINE'  Symbol" 

PTR13      In  BLKP:  <SAVEBL0K>  (used  to  save  and  restore  characters  displaced 

by  (**---)  or  ( )  or  (:89:)  in  print  buffer 

PTR14      In  PRNT:  Points  to  End-of-print  context,  if  in  core  (large  value, 

otherwise) 

PTR15      In  BLKP:  Points  to  <LINEMARK>  or  <LINEMARK+1> 
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CHR  Registers 

CHRO  "FIND"  context  start  character 

CHR1  "FIND"  context  end  character 

CHR2  SEARCH  Level -0  start  character 

CHR3  SEARCH  Level -0  end  character 

CHR4  SEARCH  Level-!  start  character 

CHR5  SEARCH  Level -1  end  character 

CHR6  SEARCH  Level -2  start  character 

CHR7  SEARCH  Level -2  end  character 

CHR8  (current)  PRINT  context  start  character 

CHR9  (current)  PRINT  context  end  character 

CHR10  END-of-SEARCH  character 

CHR11  PRINT-SEARCH  context  start  character 

CHR12  PRINT-SEARCH  context  end  character 

CHR13  Hex  :0020:   (Blank) 

CHR14  Hex  :0088:   (End  of  buffer  symbol) 

CHR15  Hex  :0080:   (End  of  table  symbol;  end  of  STRING  in  STRTXT  Table) 


112 


CTR  Registers 

CTRO  Used  by  every  string  instruction 

IN  PRNT:  number  of  characters  for  printed  line 

CTR1  Number  of  disk  sectors  currently  in  core  (or  number  about  to  be  read) 

CTR2  Number  of  disk  sectors  remaining  to  process 

CTR3  In  NBUF:  Character  at  start  of  required  text 

CTR4  Hex  :0089:   (End  of  print  buffer) 

CTR5  Hex  :0084:   (Outdent  symbol) 

CTR7  Hex  :008D:   (Used  by  PRNT  in  searching  CTXT  Table) 

CTR8       In  PRNT:  CTXTEND  (End  character  for  current  major  print  CTXT  or 

last  major  CTXT  printed) 

CTR10      Print  control  flags  (Copy  of  Flags  8-12:  Positions  10-12  may  be 
changed  by  PRNT  as  context  changes) 

CTR11  Reserved  for  search  control  flags  (Copy  of  Flags  8-12) 

CTR12  Hit  count  (Used  in  conjunction  with  "CONTINUE?"  message) 

CTR13  125  (Maximum  number  of  text  characters/printed  line) 

CTR14  Hex  :0004: 

CTR15  Hex  :0000: 
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FLAGS  (Low-order  bit  is  numbered  15) 

BITO      First  Entry  (to  PRNT) 

BIT!       Last  Entry  (to  PRNT) 

BIT2      Buffer  changed  (Used  in  PRNT) 

BIT3      Special  condition  indicator:  This  bit  =  1  if  and  only  if 
"PRINT"  context  is  minor  (sentence  or  paragraph)  and  "FIND" 
context  is  major  or  minor,  but  larger  than  "PRINT"  context 

BIT4      Outdent  Flag 

BIT6       If  this  bit  =  1,  return  from  first  call  to  PRNT  before 
printing  individual  sentences  or  paragraphs 

BIT7       First  Hit  (Return  from  HUNT  after  first  success  in  locating 
any  requested  string) 

"FIND"  is  a  major  context  section 

"FIND"  is  SENTENCE 

] CURRENTI   "PR^T"  context  is  a  major  context  section 

ICURRENTf   "print"  context  is  SENTENCE 

"FIND"  context  is  a  Bibliographic  Data  subsection 

Search  Mode:  0  implies  "Revised  order"  -  1  implies  "Original 
order" 

Document  Success:  If  this  bit  =  1,  the  document  currently 

being  searched  satisfies  the  search  request 

Search  Success:  If  this  bit  =  0  after  a  search,  some  document 
in  the  data  base  satisfies  the  search  request 


BIT8 

* 
BIT9 

BITIO* 

* 
BITll 

BIT12* 

BIT13 

BIT14 

BIT15 

* 


Flag  bits  8-12  appear  in  Counter  Register  11  instead  of  the  variable  FLAGS 
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